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Preface 


With increasing demands for efficiency and product quality and progressing integra¬ 
tion of automatic control systems in high-cost and safety-critical processes, the field 
of supervision (or monitoring), fault detection and fault diagnosis plays an important 
role. The classical way of supervision is to check the limits of single variables and 
alarming of operators. However, this can be improved significantly by taking into 
account the information hidden in all measurements and automatic actions to keep 
the systems in operation. 

During the last decades theoretical and experimental research has shown new 
ways to detect and diagnose faults. One distinguishes fault detection to recognize 
that a fault happened, and fault diagnosis to find the cause and location of the fault. 
Advanced methods of fault detection are based on mathematical signal and process 
models and on methods of system theory and process modelling to generate fault 
symptoms. Fault diagnosis methods use causal fault-symptom-relationships by ap¬ 
plying methods from statistical decision, artificial intelligence and soft computing. 
Therefore, efficient supervision, fault detection and diagnosis is a challenging field 
by encompassing physical oriented system theory, experiments and computations. 
The considered subjects are also known as fault detection and isolation (FDI) or 
fault detection and diagnosis (FDD). 

A further important field is fault management. This means to avoid shut-downs 
by early fault detection and actions like process condition-based maintenance or re¬ 
pair. If sudden faults, failures or malfunctions cannot be avoided, fault-tolerant sys¬ 
tems are required. Through methods of fault detection and reconfiguration of redun¬ 
dant components, break-down and in the case of safety-critical processes accidents 
may be avoided. 

The book is intended to give an introduction to advanced supervision, fault detec¬ 
tion and diagnosis and fault-tolerant systems for processes with mainly continuous, 
sampled signals. Of special interest is an application-oriented approach with methods 
which have proven their performance in practical applications. 

The material is the result of many own research projects during the last 25 
years on fault detection and diagnosis, but also of publications by many other re¬ 
search groups. The development of the field can especially be followed by the IFAC- 
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Symposia series “SAFEPROCESS”, which was initiated 1991 in Baden-Baden and 
then repeated all three years in Helsinki, Hull, Budapest, Washington, Beijing, and 
the IFAC-Workshop “On-line fault detection and supervision in the chemical indus¬ 
tries” started in Kyoto (1986), and then held in Newark, Newcastle, Folaize and 
Cheju, but also by other conferences. 

The book is dedicated as an introduction in teaching the field of fault detec¬ 
tion and diagnosis, and fault-tolerant systems for graduate students or students of 
higher semesters of electrical and electronic engineering, mechanical and chemical 
engineering and computer science. As the treated field is in a phase of increasing 
importance for technical and also non-technical systems, it has been tried to present 
the material in an easy to understand and transparent way and with realistic perspec¬ 
tives for the application of the treated and discussed methods. Therefore the book 
is also oriented towards practising engineers in research and development, design 
and manufacturing. Preconditions are basic undergraduate courses of system theory, 
automatic control, mechanical and/or electrical engineering. 

The author is greatful to his research associates, who performed many theoretical 
and practical research projects on the subject of this book since 1975, among them 
H. Siebert, L. Billmann, G. Geiger, W. Goedecke, S. Nold, U. Raab, B. Freyermuth, 
S. Leonhardt, R. Deibert, T. Hofling, T. Pfeufer, M. Ayoubi, P. Balle, D. Fiissel, 
O. Moseler, A. Wolfram, F. Kimmich, A. Schwarte, M. Vogt, M. Munchhof, D. Fi¬ 
scher, F. Haus and I. Unger. Following chapters or sections were worked out by: 
8.4.4: F. Kimmich, 9.2.3: M. Vogt, 10.4.2: P. Balle, 11.4.2: I. Unger, 13: F. Haus, 
15.2, 16, 17.3 and 23.2: D. Fiissel. I appreciate these contributions highly as valu¬ 
able inputs to this book. 

Finally, I especially would like to thank Brigitte Hoppe for the laborious and pre¬ 
cise text setting, including the figures and tables in camera-ready form. 

Darmstadt, February 2005 

Rolf Isermann 
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Introduction 


Since about 1960 the influence of automation on the operation and the design of 
technical processes increased progressively. This development of expanding process 
automation was caused by an increasing demand on the process performance or the 
product quality, the independence of process operation from the presence of human 
operators, the relieve of operators from monotonic tasks and because of rising wages. 
The degree of automation was pushed forward drastically since around 1975 when 
relatively cheap and reliable microcomputers were available and could solve many 
automation problems in one device. This was paralleled by further progress in the 
areas of sensors, actuators, bus-communication systems, and human-machine inter¬ 
faces. The improvement in the theoretical understanding of processes and automation 
functions also played a large role. 


1.1 Process automation and process supervision 

Figure 1.1 shows a simplified scheme for process automation of two coupled process 
parts. The lower level contains the sequential control, feedforward and feedback con¬ 
trol. Supervision can be assigned to a medium level. The higher levels comprise more 
global acting tasks like coordination, optimization and management. Important infor¬ 
mation about the process is displayed at the operator’s console. 

Great progress could be observed for digital sequence control and digital (con¬ 
tinuous) control. An enormous activity has shown up in the theory, development and 
implementation of feedback control systems. Especially process model-based con¬ 
trol systems, which comprise state observers and parameter estimation and compen¬ 
sate for nonlinearities have shown large improvements. Herewith, the modern theory 
of dynamic systems and signals has had a great influence and many processes with 
difficult behavior can now much better be controlled than earlier. 

However, the better the control functions are performed in the lower levels, the 
more important become the supervision functions, because operators are removed 
from the process. An acting human operator does not only control the process with 
regard to setpoints or time schedules. He or she also supervises a process, especially 
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if there is a direct contact with the process. Therefore, with the improvement of lower 
level control functions the supervising functions must be improved, too 

Another reason is the integration of control systems into the process, where 
process and its control become autonomous units. This is, for instance, the case in 
mechatronic systems and shows, for example, up in drive-by-wire aircraft and vehi¬ 
cles. Here not only the control tasks itself but also the reliability and safety depends 
on the correct functioning of all process parts, actuators, sensors and control comput¬ 
ers. Faults in the system must then immediately be displayed to the operator (pilot, 
driver) and redundant or reconfigurable components must be activated by a fault- 
management system for critical processes. 

The automatic supervision in the past was mostly realized by limit checking (or 
threshold checking) of some important process variables, like, e.g. force, speed, pres¬ 
sure, liquid level, temperatures. Usually alarms are raised if limit values are exceeded 
and operators have to act or protection systems act automatically. This is in many 
cases sufficient to prevent larger failures or damages. However, faults are detected 
rather lately and a detailed fault diagnosis is mostly not possible with this simple 
method. Methods of modern systems theory show the systematic use of mathemat¬ 
ical process and signal models, identification and estimation methods and methods 
of computational intelligence. With these methods it is possible to develop advanced 
methods of fault detection and diagnosis. The goals of these methods are, for exam¬ 
ple: 
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• early detection of small faults with abrupt or incipient behavior; 

• diagnosis of faults in actuators, processes, components and sensors; fault detec¬ 
tion in closed loops; 

• supervision of processes in transient states; 

• process condition-based maintenance and repair; 

• deep quality control of assembled products in manufacturing; 

• teleservices like remote fault detection and diagnosis; 

• basis for fault management; 

• basis for fault-tolerant and reconfigurable systems. 


1.2 Contents 

To treat these advanced supervision, fault detection and diagnosis methods the book 
is divided in five parts: 

I Fundamentals 

II Fault-detection methods 

III Fault-diagnosis methods 

IV Fault-tolerant systems 

V Application examples 

The first chapter of Part /, Chapter 2, describes the basic tasks of supervision, 
monitoring, automatic protection, fault detection and fault diagnosis up to fault man¬ 
agement. As the treated subject is distributed over many different technological ar¬ 
eas, the used terminology is not unique. Therefore an attempt is made to give def¬ 
initions to frequently used terms like faults, failures, malfunctions, reliability, avail¬ 
ability, safety, dependability and integrity, with reference to international standards. 
One goal of advanced supervision is the improvement of reliability, availability and 
maintainability. Therefore some basics are summarized in Chapter 3. Measures for 
the reliability like failure rate and MTTF, and for maintainability and availability are 
given together with numerical examples. For safety related systems, special analysis 
and synthesis methods are required, which are covered by the terms safety, sys¬ 
tem integrity and dependability, Chapter 4. A brief summary is given of event 
tree analysis, fault tree analysis, failure mode and effects analysis (FMEA), hazard 
analysis and risk classification. 

Part II treats the basic fault-detection methods. As advanced methods of fault 
detection are using mathematical process and signal models, see Figures 1.2 and 1.3, 
Chapters 5 and 6 describe some basic continuous-time and discrete-time models. An 
important issue in this connection is the mathematical modelling of faults. Different 
kinds of frequently used fault models, different time behavior and their influence 
on process models is discussed. Examples are given how faults can be modelled for 
actuators, processes and sensors. 

Static and dynamic process models are considered and it is shown how additive 
(offset) and multiplicative (parametric) faults influence the measurable signals. Then 
some models for periodic and stochastic signals are given which are suitable for fault 
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detection with signal-analysis methods. Chapters 7 and 8 treat fault-detection meth¬ 
ods based on the measurement of single signals, see Figure 1.4 left. Chapter 7 gives a 
survey on the most frequently used way for fault detection, the limit checking. This 
is usually applied for measurable absolute values and their trends. Then more sophis¬ 
ticated change-detection methods are considered. A basic method is, of course, the 
real-time estimation of the mean and variance of observed stochastic variables. Also 
statistical tests like hypothesis testing, /-test, run-sum test, /’’-test, likelihood ratio 
test are discussed. Furtheron fuzzy thresholds, adaptive thresholds and plausibility 
checks are described. 

The fault detection with signal models in Chapter 8 considers first periodic 
signals. Classical methods like Fourier and correlation analysis including FFT and 
spectral estimation are summarized, followed by the identification of non-stationary 
periodic signals with short-time Fourier transform and wavelet transform and the 
identification of stochastic signals. The goal is to detect changes of the signal behav¬ 
ior caused by process faults, see Figure 1.2. 

The following chapters treat the fault detection with process models, see Figure 
1.4 right. As faults may change the behavior of processes between input and output 
signals, changes in the behavior of the processes can be used to indicate inherent 
faults which are not directly measurable. Therefore attempts are made to extract 
changes in the process behavior by using several measurements, see the scheme in 
Figure 1.3. This also means that “analytical redundancy” between measured signals 
is used, expressed by process models. 

Chapter 9 considers fault detection with process-identification methods. Here 
the process models adapt to the individual process behavior by using cross-correlation 
or parameter estimation. Especially the recursive least squares parameter estimation 
method including their modifications is described for linear time-invariant and time- 
variant processes with discrete-time and continuous-time signals. A great advantage 
is that powerful methods exist for the identification of nonlinear processes because 
most real processes are nonlinear. Parameter estimation for static and dynamic non¬ 
linear processes is considered and an extract for applicable neural networks for static 
and dynamic systems and implementation as look-up tables is given. Fault symptoms 
then reflect as parameters or output signal deviations. 

The methods of parity equations are using fixed process models, Chapter 10. 
They can be designed with transfer functions leading to output or equation errors 
which are called primary residuals, or with state-space models. In order to make the 
residuals more sensitive and robust to certain faults, enhanced residuals can be gen¬ 
erated, giving the residuals special structures or directions. Depending of the process 
model structure and the kind of faults, strongly isolating or weakly isolating residuals 
can be distinguished. 

A further alternative for model-based fault detection are state observers and 
state estimation. Chapter 11. Changes in the input/output behavior of a process 
lead to changes of the output error and state variables. Therefore they can be used 
as residuals. Enhanced residuals are obtained with fault-detection filters or bank of 
observers. Similar approaches for noisy processes are possible by state estimation 
with Kalman filters. Output or unknown input observers result from a transforma- 
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tion to new state variables and outputs such that unknown inputs have no influence 
on the residuals. A comparison of the computational form of the residual equations 
and simulations show similarities between the parity equation and observer-based 
methods. 

A special chapter is devoted to the fault detection of control loops, Chapter 12. 
As detuned controller parameters, actuator or sensor faults and some disturbances 
have a similar effect on the control performance, it is rather difficult to detect and di¬ 
agnose faults in closed loop. A comparison of model-based fault-detection is shown 
in Chapter 14 by considering the assumptions made and simulations. The suitability 
of the individual methods for special types of faults is discussed and certain combi¬ 
nations are proposed. 

Chapter 13 gives an introduction to the principal component analysis which 
under the assumption of linearity, analyzes the fluctuations of input and output vari¬ 
ables of large scale processes and reduces the number of variables to those being 
uncorrelated while preserving most of the information. Changes of the new variables 
are then used to form residuals. Figure 1.4 summarizes the treated fault-detection 
methods. 



Fig. 1.4. Survey on fault-detection methods 


Part III provides an overview of most important fault-diagnosis methods. Based 
on the symptoms of the fault-detection methods, with binary or fuzzy thresholds, and 
different kind of residuals or features, the task is to find the cause of the faults, see 
Figure 1.5. The symptoms may be analytical or heuristic, which are observed by 
humans and expressed as linguistic terms, and exist in form of numbers which are 
calculated. After an introduction into the basic problems in Chapter 15, fault diag¬ 
nosis with classification methods is described, from classical pattern recognition, 
geometric classifiers to neural networks, Chapter 16. With more information, fault 
trees can be established, allowing inference methods for approximate reasoning, 
forward and backward chaining. Chapter 17. Hybrid neuro-fuzzy systems then are 
used to identify fault trees with if-then rules. Figure 1.6 gives an overview of the 
treated fault-diagnosis methods. 
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Fig. 1.5. Methods of fault diagnosis. S: symptoms, F: faults, E: events 



Fig. 1.6. Survey on fault-diagnosis methods 


Fault-tolerant systems which tolerate appearing faults are treated in Part IV. 
Chapter 18 presents the fault-tolerant design with basic static and dynamic redun¬ 
dant structures, e.g. with voters, hot and cold standby, see Figure 1.7. The various 
degradation steps with states of fail safe, fail operational and fail silent are consid¬ 
ered. Then it is shown how fault-tolerant sensors with hardware or analytical redun¬ 
dancy and fault-tolerant actuators can be built, like the example in Figure 1.8. Finally, 
fault-tolerant components and fault-tolerant control systems are briefly discussed 
in Chapter 19. The last two chapters consider parts of a general fault management. 

The last Part V shows some application examples for model-based fault detec¬ 
tion in detail. Experimental results are shown for a DC motor, Chapter 20, an AC 
motor driven centrifugal pump, Chapter 21 and an automotive suspension. Chap¬ 
ter 22. A more comprehensive presentation of many other technical applications is 
provided by another book, [1.25]. 

Summing up, after a summary of basics in the area of reliability and safety the 
chapters describe in the form of an introduction basic methods for fault detection, 
fault diagnosis and an extract of a general fault management. These methods can be 
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Fig. 1.7. Basic fault-tolerant structures and possible degradation steps for safety-related systems 


modules 



Fig. 1.8. Redundancy schemes for electronic hardware (extract): (a) static redundancy; (b) 
dynamic redundancy (hot standby) 


applied to many different types of actuators, processes and sensors in open or closed 
loop, as indicated in Figure 1.9. 


1.3 Historical notes 

The historical development of the various supervision, fault-detection and diagnosis 
methods is difficult to describe because original contributions are very much distrib¬ 
uted in the technical literature. Limit checking is probably as old as the instrumen¬ 
tation of machines dating back to the end of the 19th century. For the supervision 
of plants the use of ink and later point printing recorders was standard equipment 
since about 1935. Later, around 1960, analog controllers with transistor-based am¬ 
plifiers (operation amplifiers) and sequential controllers with hardwired devices be¬ 
came available and then still used limit checking. Signal model-based methods like 
spectral analysis could be realized with analog bandpass filters and oscilloscopes. 
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The implementation of on-line operating process computers in 1960 then opened 
the way for improved supervision methods, like trend analysis. In 1968 first pro¬ 
grammable logic controllers were introduced to replace hardwired controllers for 
electromechanical relays. This made the realization of protection systems easier. The 
advent of the microcomputer in 1971 and its increasing application in decentralized 
process automation systems since 1975 was the beginning of computationally more 
involved, software-based supervision and fault-detection algorithms. First publica¬ 
tions on process model-based fault-detection methods appeared in connection with 
aerospace systems, [1.2], [1.27], [1.37], [1.6], and with chemical plants, [1.15]. Sev¬ 
eral of these first concepts can be classified as parity relation approaches, checking 
the consistency of instrument readings or mass or material balances. Residuals of 
mass balance were, for example, applied for leak detection of pipelines, [1.35]. The 
approach of parity relations was further investigated by [1.14], 

State observer-based methods were developed to generate output residuals, ap¬ 
plying Luenberger state observers, [1.2], [1.27] or Kalman filters, [1.28]. The an¬ 
alytic redundancy between several measurements was used for sensor fault detec¬ 
tion, applying a bank of observers, [1.6]. In order to compensate for not-measurable 
inputs, unknown input observers or output observers were developed, [1.36] and 
[1.11]. Observers with eigen-structure assignment go back to [1.29]. 

Another way of fault detection is the use of parameter estimation. First publi¬ 
cations appeared by [1.16], [1.1] in connection with jet turbines and [1.19], [1.20], 
[1.21] for processes in general and circulation pumps and DC motors as examples, 
and [1.8] and [1.7] for electrical motors. 

Since these early publications many contributions were made in the field of fault 
detection and diagnosis. The development can be followed up by survey articles 
like [1.21], [1.22], [1.23], [1.9], [1.10], [1.12], [1.30]. A summary of publications 
during 1991-1996 with applications is given in [1.26]. Furtheron the multi-authored 
books [1.31], [1.32] give a good picture of the field. Several books on fault detection 
provide a valuable summary of the different approaches: [1.33], [1.15], [1.4], [1.34], 
[1.24], [1.3], [1.13], [1.5], 

Another source for many publications is the IFAC Symposium series SAFE- 
PROCESS, Baden-Baden (1991), Helsinki (1994), Hull (1997), Budapest (2000), 
Washington (2003), Beijing (2006), [1.17] and the IFAC Workshop “On-line fault 
detection and supervision in the chemical process industries”, Kyoto (1986), Newark 
(1992), Newcastle (1995), Folaize (1998), Cheju (2001), [1.18], 

Further original publications are cited together with the treatment of single sub¬ 
jects in the following chapters. 
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Fundamentals 
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Supervision and fault management of processes - tasks 
and terminology 


The supervision of technical processes is aimed at showing the present state, indicat¬ 
ing undesired or unpermitted states, and taking appropriate actions to avoid damage 
or accidents. The deviations from normal process behavior result from faults and er¬ 
rors, which can be attributed to many causes. They may result in some shorter or 
longer time periods with malfunctions or failures if no counteractions are taken. One 
reason for supervision is to avoid these malfunctions or failures. In the following 
sections the basic tasks of supervision are shortly described. This is then followed by 
a closer look on some terms and the terminology used in this field. 


2.1 Basic tasks of supervision 

A process P or a product is considered which operates in open loop. Figure 2.1 a. 
U{t) and Y(t) are each a measurable input and output signal, respectively. A fault can 
now appear due to external or internal causes. Examples for external causes are envi¬ 
ronmental influences like humidity, dust, chemicals, electromagnetic radiation, high 
temperature leading, e.g. to corrosion, pollution. Examples for internal causes are 
missing lubrication and therefore higher friction or wear, overheating, leaks, short¬ 
cuts. These faults F(?) then affect first internal process parameters 0 by A0 (7) like 
changes of resistances, capacitances or stiffness and/or internal state variables x(f) 
by Ax(f) like changes of mass flows, currents or temperatures, which are frequently 
not measurable. According to the dynamic process transfer behavior, the faults in¬ 
fluence the measurable output Y(t) by a change AT(f). However, it has to be taken 
into account that also natural process disturbances and noise N(t) and also changes 
of the manipulated variable U(t) influence Y(t ). 

A remaining fault /(?) generally results for a process operating in open loop in 
a permanent offset of A Y(t), as shown in Figure 2.2. In case of a closed loop the 
behavior is different. Depending on the time history of parameter changes A0(?) or 
state variable changes Ax(?) the output shows only a more or less shorter and van¬ 
ishing small deviation A Y(t) if a control with integral behavior (e.g. a Pi-controller) 
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(a) (b) 


Fig. 2.1. Scheme of a process influenced by faults F: (a) process in open loop; (b) process in 
closed loop 


is used. But then the manipulated variable shows a permanent offset A (7(f) for pro¬ 
portionally acting processes. If only the output Y (f) is supervised, the fault may not 
be detected because of the small and short deviation, furthermore corrupted by noise. 
The reason is that a closed loop is not only able to compensate for disturbances N(t) 
but also to compensate for parameter changes A0(f) and state changes Ax(f) with 
regard to the control variable Y(t). This means that faults F(f) may be compensated 
by the closed loop. Only if the fault grows in size and causes the manipulated vari¬ 
able to reach a restriction value (saturation) a permanent deviation AT may arise. 
For processes in closed loop, therefore (7(f) should be monitored, as well as T(f) 
what is frequently not realized. Mostly only Y(t ) and the control deviation e(t) are 
supervised. 

The supervision of technical processes in normal operation or the quality control 
of products in manufacturing is usually performed by limit-checking or threshold¬ 
checking of some few measurable output variables Y (f), like pressures, forces, liquid 
levels, temperatures, speeds, oscillations. This means one checks if the quantities are 
within a tolerance zone Y m j n < Y(t) < Y max . An alarm is then raised if the toler¬ 
ance zone is exceeded. Hence, a first task in supervision is, compare Figure 2.3: 

1. Monitoring: Measurable variables are checked with regard to tolerances, and 
alarms are generated for the operator. After an alarm is triggered the operator then 
has to take appropriate counteractions. 

However, if exceeding of a threshold means a dangerous process state, the coun¬ 
teraction should be generated automatically. This is a second task of supervision. 
Figure 2.3: 

2. Automatic protection: In the case of a dangerous process state, the monitoring 
function automatically initiates an appropriate counteraction. Usually, the process is 
then commanded to a fail-safe state, which is frequently an emergency shut down. 
Table 2.1 shows some examples. 
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Fig. 2.2. Time behavior of a parameter change A© and measurable signals Y(t) and U(t) 
after appearance of fault /: (a) open loop; (b) closed loop 



Fig. 2.3. Monitoring and automatic protection 
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Table 2.1. Examples for automatic protection 


Process 

Fault 

Counteraction 
(safe state) 

Name of protection 
device 

electric cable 

short cut 

interruption of 
current 

electrical fuse 

electrical motor 

overheating 

interruption of 
current 

temperature 

protector 

steam turbine 

overspeed 

fast valve closes 

overspeed protector 

heating boiler 

overheating (boiling) 
(boiling) 

interruption of 
fuel supply 

safety temperature 
switch 

aircraft combustion 
engine 

break of flexible 
linkage 

full throttle 
(max. power) 

throttle spring 

automotive engine 

break of flexible 
linkage 

idle gas 
(min. power) 

throttle spring 


These classical methods of monitoring and automatic protection are suitable for 
the overall supervision of the processes. To set the tolerances, compromises have 
to be made between the detection size for abnormal deviations and unmeasurable 
or wrong alarms because of normal fluctuations of the variables. Most frequently, 
limit checking with fixed thresholds is applied which works well if the process stays 
in steady-state or the monitored variable does not depend on the operating point. 
However, the situation becomes more involved if the monitored variable changes 
dynamically with other operating points, like, e.g. forces in rolling mills or machine 
tools or pressures and temperatures in chemical batch processes. 

The advantage of the classical limit-value based supervision method is their sim¬ 
plicity and reliability for steady-state situations. However, it is only possible to react 
after a relatively large change of a process feature, i.e. after a large sudden fault or 
a long-lasting gradually increasing fault. In addition, an in-depth fault diagnosis is 
usually not possible based on the threshold violation of one or a few variables. 

To improve the supervision of technical processes or to improve the quality con¬ 
trol of manufactured products a first step could be to implement additional sensors 
with are related to expected faults and to implement the operators know-how in com¬ 
puters. However, the use of additional sensors, cables, transmitters, plugs for getting 
better information of special faults does not only increase the costs but at the same 
time deteriorates the overall reliability because the probability of faults increases 
with more elements. Also the direct software implementation of operator knowledge 
is not an easy task and does not lead much further without physically-based process 
models. 

For large-scale processes with many monitored and limit-checked values, there 
is another problem: after a severe process fault or failure several alarms may be 
triggered in short time sequence, known as “alarm-shower”. The operators then are 
overloaded with regard to their immediate reactions and to finding the causes of the 
faulty behavior. 
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Therefore advanced methods of supervision, fault detection and fault diagnosis 
are required which satisfy the following requirements: 

(i) early detection of small faults with abrupt or incipient time behavior; 

(ii) diagnosis of faults in the processes or process parts and their manipulating de¬ 
vices (actuators) and measurement equipment (sensors); 

(iii) detection of faults in closed loops; 

(iv) supervision of processes in transient states. 

The goal for the early fault detection and diagnosis is to have enough time for 
counteractions such as other operations, reconfiguration, planned maintenance or re¬ 
pair. 

Figure 2.4 shows a general scheme how, in addition to the classical monitor¬ 
ing and automatic protection, these goals can be reached by automatic means. The 
intention is to generate more information about the process by using all available 
measurements and to relate them together in form of mathematical process models. 
If not only output signals Y(t) are measured but also the corresponding input sig¬ 
nals U(f), some accessible state variables x(t ') and maybe disturbance signals, then 
changes of the static and dynamic behavior of the processes by the faults can be used 
as important information source. Then also changes of output signals AY (?) which 
are not caused by faults but by input signals AU(t) or measurable disturbances are 
automatically taken into account and therefore make the observed comparison vari¬ 
ables more sensible to faults. This means that the effects on the outputs Y(f) by either 
normal disturbances or faults are automatically separated. 

The general scheme in Figure 2.4 shows in the third level the following tasks: 

3. Supervision with fault diagnosis 

(a) feature generation by, e.g. special signal processing, state estimation, identifica¬ 
tion and parameter estimation or parity relations; 

(b) fault detection and generation of symptoms', 

(c) fault diagnosis by using analytical and also heuristic symptoms and their rela¬ 
tions to faults, e.g. by classification methods or reasoning methods via fault- 
symptom trees. The goal is to determine the kind, size and location of the fault; 

(d) fault evaluation with regard to classify the faults into different hazard classes; 

(e) decision on actions dependent on the hazard class and possible degree of danger. 
This may be done either automatically or by the operator. Some examples for 
hazard classes are shown in Table 2.2. 

Based on the gained in-depth information about the condition of the process, fur¬ 
ther tasks are necessary in order to improve the reliability or safety: 

4. Supen’ision actions and fault management: Depending on the hazard classes of 
the diagnosed fault(s) the following actions can be taken: 

(a) safe operation, e.g. shut down if there is an immanent danger for the process or 
the environment; 
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Fig. 2.4. General scheme of different supervision methods with fault management (supervisory 
loop) 


(b) reliable operation, e.g. by hindering a further fault expansion through changes 
of operation state, e.g. operation with lower load, speed, pressure, temperature; 

(c) reconfiguration, e.g. by using other sensors, actuators or redundant (standby) 
components to keep the process in operation and under control with a “reconfig¬ 
ured” structure; 

(d) inspection to perform a detailed diagnosis by additional measures; 

(e) maintenance, e.g. instantaneously or by next possibility to tune process parame¬ 
ters or exchange worn parts; 

(f) repair, e.g. instantaneously to remove a fault or at next possibility (overhaul or 
revision). 

These actions are also called fault management and may incorporate several in¬ 
termediate actions in the case of redundant systems if the process is in a dangerous 
state, as, e.g. for aircraft, power plants, chemical plants or automatic guided vehicles. 

Hence, the advanced methods of supervision and following actions are means to 
improve both the reliability and the safety of technical systems. Of course, these im¬ 
provements by better information processing and computational intelligence have to 
be accompanied on the process side by further improving the reliability of all hard¬ 
ware components, by, e.g. proper materials, stress and overall design. Some further 
interesting developments are 
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Table 2.2. Examples for the evaluation of faults and related actions (fault management) 


Hazard 

class 

Affection of 

Actions 

Example 

safety 

reliability 

during 

operation 

after 

operation 


1 

high 

high 

emergency fail¬ 
safe (e.g. stop) 

repair 

bearing without 
lubrication 

2 

medium 

medium 

other operation 
state 

maintenance 

bearing with too 
low oil pressure 

3 

medium 

high 

reconfiguration 

repair 

one pump of a 
duplex pump 
system overheated 

4 

no 

high 

transfer to 

reliable state 

repair 

unbalanced 

wheels 

5 

no 

small 

no 

maintenance 

leaking seals 


• maintenance on request (process condition); 

• tele-diagnosis with modern communication; 

• 100 % quality control of products. 

As especially maintenance costs resemble in most cases a high percentage (e.g. 
< 20 %) of overall operating costs, the advanced supervision and diagnosis may help 
to reduce maintenance effort and costs and improve the life time of the processes. 

The general scheme in Figure 2.4 shows that there exists a feedback system from 
faults, signals, features, symptoms, decisions over various actions to compensate 
for faults. Therefore, this can be called supervisory loop or fault management loop. 
However, different to feedback control the signals or states are not all in continuous 
action. Some parts of information processing like signal evaluation, feature genera¬ 
tion and symptom generation may operate continuously, but fault diagnosis, decision 
making and actions act as discrete events in the case of fault appearance. Hence, the 
supervisory loop is a hybrid continuous and discrete event system. 

The known literature on the state-of-the-art of supervision and fault management 
is mostly related to special processes: Examples are 

• machines: [2.39], [2.10], [2.2], [2.9], [2.21]; 

• electrical motors: [2.6], [2.7]; 

• pumps: [2.7], [2.25], [2.42], [2.4], [2.11]; 

• steam turbines: [2.34]; 

• manufacturing: [2.33]; 

• bearings and machinery: [2.5], [2.43], [2.40], [2.21]; 

• aircraft: [2.27], [2.24], [2.23]; 

• automotive systems: [2.31], [2.20], [2.18]; 

• chemical processes: [2.12], [2.32]. 

Books on model-based methods for fault detection are: [2.28], [2.29], [2.16], 
[2.8], [2.3], [2.35], [2.1], 

The subject of fault-tolerant systems is treated in [2.22], [2.37]. 
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2.2 Faults, failures, malfunctions 

As the treated field from faults and failures through reliability, safety and fault- 
tolerant systems is distributed over many different technological areas, the used ter¬ 
minology is not unique. Various efforts have been made to come to a standardiza¬ 
tion, for example, the RAM (reliability, availability and maintainability) dictionary, 
[2.26], in contributions [2.14] and several German standards as DIN and VDI/VDE- 
Richtlinien (guidelines). The IFAC-Technical Committee SAFEPROCESS has made 
an effort to come to accepted definitions, [2.19], see also Chapter 23.1. A survey on 
related standardization literature is given in the bibliography of Chapter 23.1. The 
next sections describe the terminology used in this book, taking into account the 
mentioned literature. 

Fault: 

“A fault is an unpermitted deviation of at least one characteristic property (feature) 
of the system from the acceptable, usual, standard condition.” 

Remarks: 

• a fault is a state within the system; 

• the unpermitted deviation is the difference between the fault value and the vio¬ 
lated threshold of a tolerance zone for its usual value; 

• a fault is an abnormal condition that may cause a reduction in, or loss of, the 
capability of a functional unit to perform a required function [2.13]; 

• there exist many different types of faults, e.g. design fault, manufacturing fault, 
assembling fault, normal operation fault (e.g. wear), wrong operation fault (e.g. 
overload), maintenance-fault, hardware-fault, software-fault, operator’s fault. (Some 
of these faults are also called errors, especially if directly caused by humans); 

• a fault in the system is independent of whether the system is in operation or not; 

• a fault may not effect the correct functioning of a system (like a small rent in an 
axle); 

• a fault may initiate a failure or a malfunction; 

• frequently, faults are difficult to detect, especially, if they are small or hidden; 

• faults may develop abruptly (stepwise) or incipiently (driftwise). 

Failure: 

“A failure is a permanent interruption of a system’s ability to perform a required 
function under specified operating conditions.” 

Remarks: 

• a failure is the termination of the ability of a functional unit to perform a required 
function, [2.13]; 

• a failure is an event; 
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• a failure results from one or more faults; 

• different types of failures can be distinguished: 

- number of failures: single, multiple; 

- predictability: 

random failure (unpredictable, as, e.g. statistically independent from op¬ 
eration time or other failures); 

deterministic failure (predictable for certain conditions); 
systematic failure or causal failure (dependent on known conditions); 

• usually a failure arises after begin of operation or by increasingly stressing the 
system. 

Malfunction: 

“A malfunction is an intermittent irregularity in the fulfillment of a system’s desired 
function.” 

Remarks: 

• a malfunction is a temporary interruption of a system’s function; 

• a malfunction is an event; 

• a malfunction results from one or more faults; 

• usually a malfunction arises after begin of the operation or by increasingly stress¬ 
ing the system. 

Figure 2.5 shows the relation of faults, failures and malfunctions. The fault may 
develop abruptly, like a step-function, or incipiently, like a driftlike function. The 
corresponding feature of the system related to the fault is assumed to be proportional 
to the fault development. After exceeding the tolerance of normal values, the feature 
indicates a fault. Dependent on its size, a failure or a malfunction of the system 
follows at time t e . Table 2.3 shows some example. 


2.3 Reliability, availability, safety 

With regard to the overall functioning of elements, components, processes and sys¬ 
tems the terms reliability, availability and safety play an important role. These terms 
are considered in more detail in Chapters 3 and 4. 

Reliability: 

“Ability of a system to perform a required function under stated conditions, within a 
given scope, during a given period of time,” 


Remarks: 

• short version: ability to perform a required function for a certain period of time; 
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Fig. 2.5. Development of the events "failure” or "malfunction” from a fault which causes 
stepwise or driftwise change of a feature 


Table 2.3. Examples for faults, failures and malfunctions 


Process 

Fault 

Feature 

Failure 

Malfunction 

electrical 

illumination 

switch with 

corroded contacts 

occasionally no 

electrical 

conductivity 


interrupted light 

broken wire in 

cable 

no electrical 
conductivity 

no light 

" 

electrical 

DC motor 

worn brushes 

armature 
resistance high 


occasionally inter¬ 
rupted torque and 
changing speed 

broken wire 

in excitation coil 

no electrical 

flux 

no torque, 
no speed 

“ 

machine tool 

belt drive 

belt with too 
low pretension 

no continuous 
torque transfer 


sluggish dynamics 
piecewise motion 

broken belt 

threads 

no torque 
transfer 

standstill of 

feed drive 

” 

pneumatic 

valve 

leak in supply 
air pressure 

slow motion, 
limited position 
range 


closed loop does 
not follow setpoint 
for some time 

corroded shaft 

mechanical 

friction too 
high 

no motion 
permanent 
control 

deviation 
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• reliability is quality for some time; 

• the reliability can be affected by malfunctions and failures; 

• a measure for reliability is the Mean Time To Failure MTTF = 1/A,, where A is 
the rate of failures per time unit (see Chapter 5) 

Safety: 

“Ability of a system not to cause danger to persons or equipment or the environment.” 
Remarks: 

• short version: ability not to cause danger; 

• the safety is concerned with the dangerous effects of faults, failures and malfunc¬ 
tions; 

• the safety can usually be seen as a status, where the risk is not larger than a 
specified risk limit (risk threshold). 

The measures to improve the reliability are oriented towards avoiding faults, fail¬ 
ures and malfunctions. Measures for improving safety are subject to avoid a danger¬ 
ous effect of failures and malfunctions. An improvement of the reliability generally 
improves also safety. However, an improvement of safety can result in a deteriora¬ 
tion of the reliability if, e.g. the number of components increase. Note that safety and 
security have similar meanings. Safety usually deals with life, equipment or environ¬ 
ment, whereas security deals with privacy, property, community or state. 


Availability: 

“Probability that a system or equipment will operate satisfactorily and effectively at 
any period of time.” 


Remarks: 






availability is of major importance for the user of a system; 

availability takes into account that failures and malfunctions happen and need 

some time for repair; 

a measure for availability is A = mt^tf+mttr w h ere MTTF is the Mean Time 
To Repair, see Chapter 5; 

to reach a high availability MTTF must be large in comparison to MTTR. This 
can be reached by 


- large operation time MTTF 

—> perfection: high reliable components; 

—* tolerance: tolerable faults through redundant structure 

- small repair time MTTR 

-> fast and reliable fault diagnosis; 

—> fast and reliable remove of faults (maintenance repair); 
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• fault detection and fault diagnosis can improve the availability by early fault 
detection in combination with maintenance on demand (larger MTTF) and by 
fast and reliable diagnosis (smaller MTTR). 

Dependability: 

The term dependability seems not to be clearly defined. Therefore different meanings 
are cited: 

(i) “A form of availability that has the property of always being available when 
required (and not at any time). It is the degree to which a system is operable and 
capable of performing its required function at any randomly chosen time during 
its specific operating time, provided that the system is available at the start of the 
period” 

This definition excludes non-operation related influences, [2.26]; 

(ii) “Dependability is a property of a system that justifies placing one’s reliance on 
it. It covers reliability, availability, safety, maintainability and other issues of 
importance in critical systems”, [2.37]. 

The [2.13] standard on safety-related systems does not define dependability, only 
safety integrity. 

Integrity: 

According to [2.37], the term integrity was earlier defined as: 

“The integrity of a system is the ability to detect faults in its own operation and 
to inform a human operator”. 

Over the years the meaning was broadened and associated with critical systems. 
Integrity is frequently used as a synonym for dependability. According to [2.13] it is 
defined as: 

“Safety integrity is the probability of a safety-related system satisfactorily per¬ 
forming the required safety functions under all the stated conditions within a period 
of time”. 

Some other expressions like accident, hazard, risk are defined in Chapter 4. 


2.4 Fault tolerance and redundancy 

After applying reliability and safety analysis for the improvement of the design, test¬ 
ing of the product and also corresponding quality control methods during manufac¬ 
turing, the appearance of certain faults and failures cannot be avoided totally. There¬ 
fore, these unavoidable faults should be tolerated by additional design efforts. Hence, 
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high-integrity systems must have the ability of fault tolerance. This means that faults 
are compensated in such a way that they do not lead to system failures. After the 
application of principles to improve the perfection of the components the remaining 
obvious way to reach this goal is to implement redundancy. This means that in addi¬ 
tion to the considered module one or more modules exist as back-up modules usually 
in a parallel configuration, see Figure 2.6. 



function 

modules 


Fig. 2.6. Basic scheme of a fault-tolerant system with parallel function modules as redundance 


The function modules can be hardware components or software parts, either iden¬ 
tical or diverse. Different arrangements of fault-tolerant systems exist with static or 
dynamic redundancy, cold or hot standby. In general, the function modules are su¬ 
pervised with fault-detection capability followed by a reconfiguration mechanism to 
switch off failed modules and to switch on spare modules (dynamic redundancy). 
The modules are, e.g. actuators, sensors, computers, motors or pumps. For electronic 
hardware simpler schemes exist with n > 3 modules and majority voters to build up, 
e.g. 2-out-of-3 systems (static redundancy). These redundant systems are treated in 
Part IV. 


2.5 Knowledge-based fault detection and diagnosis 

As fault detection and fault diagnosis are fundamental for advanced methods of su¬ 
pervision and fault management, these tasks will be considered briefly. Fault de¬ 
tection and diagnosis, in general, are based on measured variables by instrumen¬ 
tal and observed variables and states by human operators. The automatic process¬ 
ing of measured variables for fault detection requires analytical process knowledge 
and the evaluation of observed variables requires human expert knowledge which is 
called heuristic knowledge. Therefore fault detection and diagnosis can be consid¬ 
ered within a knowledge-based approach, [2.30], [2.38]. Figure 2.7 shows an overall 
scheme, [2.15], [2.17], 
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Fig. 2.7. Overall scheme of knowledge-based fault detection and diagnosis 


2.5.1 Analytic symptom generation 

The analytical knowledge about the process is used to produce quantifiable, analyti¬ 
cal information. To do this, data processing based on measured process variables has 
to be performed to generate first the characteristic values by 

• limit value checking of direct, measurable signals. The characteristic values are 
the violated signal tolerances; 

• signal analysis of directly measurable signals by the use of signal models like cor¬ 
relation functions, frequency spectra, autoregressive moving average (ARMA) or 
the characteristic values as, e.g. variances, amplitudes, frequencies or model pa¬ 
rameters; 

• process analysis by using mathematical process models together with parame¬ 
ter estimation, state estimation and parity equation methods. The characteristic 
values are parameters, state variables or residuals. 

In some cases, special features can then be extracted from these characteristic 
values, e.g. physically defined process coefficients, or special filtered or transformed 
residuals. These features are then compared with the normal features of the non- 
faulty process. For this, methods of change detection and classification are applied. 
The resulting changes (discrepancies) in the mentioned directly measured signals, 
signal models or process models are considered as analytic symptoms. 
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2.5.2 Heuristic symptom generation 

In addition to the symptom generation using quantifiable information, heuristic 
symptoms can be produced by using qualitative information from human opera¬ 
tors. Through human observation and inspection, heuristic characteristic values in 
the form of special noises, colors, smells, vibration, wear and tear, etc., are obtained. 
The process history in the form of maintenance performed, repairs, former faults, 
life-time and load measures, constitutes a further source of heuristic information. 
Statistical data (e.g. MTBF, fault probabilities) achieved from experience with the 
same or similar processes can be added. In this way heuristic symptoms are gener¬ 
ated, which can be represented as linguistic variables (e.g. small, medium, large) or 
as vague numbers (e.g. around a certain value). 

2.5.3 Fault diagnosis 

The task of fault diagnosis consists in determining the type, size and location of the 
most possible fault, as well as its time of detection. 

Fault-diagnosis procedures use the analytic and heuristic symptoms. Therefore 
they should be presented in an unified form like confidence-numbers, membership 
functions of fuzzy sets or probability density functions after a statistical evalua¬ 
tion over some time. Then either classification methods can be applied, if a learned 
pattern-based procedure is preferred, to determine the faults from symptom patterns 
or clusters. If, however, more information of fault-symptom-relations, e.g. in form 
of logic fault-symptom-trees or if-then rules are known, reasoning methods with for¬ 
ward and backward chaining can be applied. 

The terminology used in this field is described in [2.19], based on definitions of 
the IFAC Technical Committee SAFEPROCESS, see Chapter 23.1. 

The related methods for knowledge-based fault detection and fault diagnosis are 
considered in detail in Part II and III of this book. 


2.6 Implementation issues 

The development of fault detection and diagnosis systems (FDD-systems) and fault- 
tolerant systems can be represented as a “V”-diagram, which originate probably from 
the [2.36] and is especially used for the design of mechatronic systems, [2.41], see 
Figure 2.8. This diagram indicates important steps of the development process in a 
sequential manner. However, though the steps are given in logical order, some steps 
are performed in parallel or in iterative ways. Each phase has usually an outcome, 
called “deliverable”. 

The development starts with stating the requirements. The expected functions are 
summarized, such as the faults to be detected and diagnosed (a fault list), the smallest 
replaceable units which can be replaced if they contain a fault, the allowable cost for 
the development and the final product. A reliability and safety analysis at this stage 
may be very useful to find the weak points and risks of the considered product or 



Requirements 

- reliability & safety analysis degree of maturity Production 

- functions: fault to be detected w - technologies 

- replaceable units ^ - quality control 

- costs 



Fig. 2.8. A “V” development scheme for fault detection and diagnosis systems (FDD-system) 

















2.7 Problems 


29 


process. Because the requirements are the basis for the top-level design procedure, 
great care should be taken for their preparation and it should continuously be updated 
during the design process if basic changes happen. 

Based on the requirements the specifications are formulated. Items are stated how 
the requirements are fulfilled, by partitioning of the functions or parts, the available 
sensors and actuators, the available computing power, use of further knowledge and 
definition of milestones. 

The fault-diagnosis system design begins frequently with mathematical mod¬ 
elling of the process, its signals and expected faults. This includes also simulation 
of the behavior without and with faults. On the basis of these considerations the de¬ 
sign of the methods for fault detection and diagnosis (FDD) and, if required, of fault- 
tolerance are performed. This is followed by the development of the FDD-methods 
with software-in-the-loop simulations (SiL). Herewith common software systems are 
used for the simulation of the process, the faults and the FDD-functions. Next, a pow¬ 
erful real-time prototype computer together with the real process can be applied (also 
called prototyping). The FDD-system is then mature for the implementation of the 
final software for the series product microcomputer. 

Then the various testing and timing procedures for the FDD-microcomputer 
hardware and software begins. First tests can be made by hardware-in-the-loop sim¬ 
ulation (HiL). Here the microcomputer works together with other real parts, like 
actuators and real-time simulation of the process and another powerful computer. 
This requires a special sensor-simulation-interface. HiL is performed if expensive 
tests with the real system can be saved or experiments with faults are made which 
are not allowed with the real process. Otherwise tests are made with the real process 
directly. 

If the FDD-system is implemented together with other functions, e.g. of auto¬ 
matic control, system integration takes place considering the functional dependen¬ 
cies of all control levels, from lower level control to top level process management, 
including documentation of FDD-results. The next tests are system tests, including 
verification and validation. Verification examines if the system meets its specifica¬ 
tions, i.e. fulfills the functions of the specifications correctly. Validation considers 
the system as a whole with regard to satisfy the requirements, i.e. examines if the 
system is appropriate for its intended purpose. Therefore, it includes consideration 
of the correctness of the specification. For critical systems external regulating author¬ 
ities have to be convinced to achieve certification. Here stated standards and guide¬ 
lines are checked and tests have to show, e.g. the fault coverage for given operating 
conditions. Field tests are usually undertaken to test the system under many different 
operating conditions, production tolerances of the processes, and hard environmental 
conditions, before the system is given to series production. 


2.7 Problems 

1) What are the basic tasks for the supervision of technical processes? 

2) State the differences between supervision and monitoring. 
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3) Find other examples for automatic protection as in Table 2.1. 

4) Which tasks are included in advanced supervision compared to classical super¬ 
vision? 

5) State the following-up of tasks for advanced supervision. 

6) What are the differences between faults, failures and malfunctions? Give one 
example for each case. 

7) State the definitions of reliability, safety and availability in a table. 

8) Find an example for a system with fault tolerance. 

9) What are the differences between analytical and heuristic symptoms? Give some 
examples. 

10) How can the tasks of fault detection and fault diagnosis be described and differ¬ 
entiated? 
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Reliability, Availability and Maintainability (RAM) 


Reliability, availability and maintainability are very important properties of modern 
products. Therefore, this chapter describes these properties by the help of charac¬ 
teristic quantities. Historically, reliability studies were first applied to electrical and 
electronic components and quality control in manufacturing around the 1940s and 
then extended to other areas like military equipment, aerospace vehicles, nuclear 
power systems, automobiles, see, e.g. [3.4], [3.22]. In the following use is made of 
standards like [3.10] and [3.6]. It is tried to use generally accepted terms and defini¬ 
tions. 


3.1 Reliability 

Definition 

“Reliability is the ability of a component, process or a system to perform a required 
function correctly under stated condition within a given scope, during a given period 
of time.” 

The reliability is affected by faults and failures. Therefore the reliability analysis 
depends on the kind of faults and failures. 

3.1.1 Type of faults 

Faults usually show a characteristic behavior for the various components. They may 
be distinguished by their form, time behavior and extent, compare Table 3.1. The 
form can be either systematic or random. The time behavior may be described by 
permanent, transient, intermittent, noise or drift, see Figure 5.3. The extent of faults 
is either local or global and includes the size. 

Table 3.1 gives an overview of a variety of fault types in dependence on the 
system components. Electronic hardware shows systematic faults if they originate in 
specification or design mistakes. Once in operation faults in hardware components 
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are mostly random with all kind of time behavior. The faults or mistakes in software 
(bugs) are usually systematic, e.g. by wrong specification, coding, logics, calculation 
overflows, etc. They are in general not random like faults in hardware. 


Table 3.1. Main characteristics (J) of primary faults for different components 


type of faults 

| components 

mechanical 

components 

electrical 

components 

electronic 

hardware 

software 

form 

systematic 

V 


V 

V 

random 


V 

V 


time behavior 

permanent 



V 

V 

transient 

V 

V 

V 


intermittent 


V 

J 

V 

noise 


V 

V 


drift 

V 

V 

V 


extent 

local 

V 

V 

V 

V 

global 



V 

V 


Failures of mechanical systems can be classified into following failure mecha¬ 
nisms: distortion (buckling, deformation), fatigue and fracture (cycle fatigue, ther¬ 
mal fatigue), wear (abrasive, adhesive, cavitation), or corrosion (galvanic, chemical, 
biological), see, e.g. [3.19]. They may appear as driftlike changes (wear, corrosion) 
or abruptly (distortion fracture) at any time or after stress. Electrical systems usu¬ 
ally consist of a large number of components with various failure modes, like short 
cuts, loose or broken connection, parameter changes, contact problems, contami¬ 
nation, EMC problems, etc. Generally, electrical faults appear more randomly than 
mechanical faults. Table 3.1 shows mainly the effect of primary faults for the dif¬ 
ferent components and their typical behavior. The extent depends very much on the 
importance of the considered components and can of course be global for all cases 
even if the faults primarily appear locally. 

Reliability analysis is usually based on the assumption of random faults. This 
holds then especially for electronic and electrical components and for large systems 
with many components and systematic faults which seem to appear randomly be¬ 
cause of their large number, as for large mechanical systems and software systems, 
compare the discussion, e.g. in [3.22]. 

3.1.2 Reliability estimation 

The reliability of a large number N of identical elements at the begin of operation is 
defined by the reliability function 

n(t) failure free elements 

_ v 7 _ 


N number of all elements at the begin of operation 


(3.1) 
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relating to number n(t) of correct functioning elements to N. It describes the prob¬ 
ability that the elements function correctly until time t. The unreliability function is 
then 

Q{t) = n f (t)/N = l-R(t) (3.2) 

with number of failed elements 


n f = N — n 


The failure rate is defined as the instantaneous rate of failing elements dn/dt related 
to still functioning elements n(t) 


m = 


1 dnf(t) 
n(t ) dt 


1 f—dR(t)\ 

R(t ) V dt J 


1 dQ(t ) 
1 — Q(t ) dt 


or in words 


m = 


1 number of failures 

number of functioning elements time interval 


(3.3) 


(3.4) 


Experience shows that electronic components and also electro-mechanical systems 
show a large decreasing failure rate after commissioning (infant mortality), then a 
constant value in the normal operation life (useful life) and finally an increasing 
value for the ageing period (wear out). This leads to the well known “bathtub-curve”, 
Figure 3.1. During normal operation-life a constant failure rate /, can be assumed, 
leading to the exponential failure law 

R(t ) = e~ Xt (3.5) 


see Figure 3.2. 



>- 

t 


Fig. 3.1. Typical failure rate X(t) for randomly appearing faults in dependence of lifetime 
("bathtub-curve”) 
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(b) 


10000 
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t, t, 

yi ‘/2 


30000 



40000 t[h\ 


Fig. 3.2. (a) Exponential reliability function R(t ) = e for constant failure rate A(A = 4-10 5 
[ 1/h]; (b) Time history of elements which function correctly ( fct = 1) for t < tf and fails at 
t = t/ifct = 0) 


Example 3.1: Reliability function 


For 1000 units and A = 4 • 10^ 5 [ 1/h] the probability of correct functioning units at 
operating time t = 20000 h is R = 0.449. For the failed units it holds: Q = 0.551. 
This means that n = 1000 • 0.449 = 449 units are still functioning (have survived) 
and n f = 1000 -0.551 = 551 units have failed. The reliability function can also be 
called survival probability. 

□ 


For initial reliability it holds with (3.3) and (3.4) 


dR{t) 
lim —-— 
r-»o at 


1 

I 


(3.6) 


Therefore, the initial tangent of R(t) cuts the time axis at t = 1/A, see Figure 3.2a. 


Remarks 

a) the reliability function R(t) describes the number of survived elements n relative 
to the initial number N of elements. It is a function of time; 

b) the failure rate A relates the number of failures per time interval to the number 
of survived elements. If A = const, then the failures per time interval become 
smaller with time. Therefore, the reliability function decays also with time. 

Another measure for reliability is the Mean Time To Failure (MTTF). It is the 
average failure free (correct) operation time tf until a failure 
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MTTF = E{t F } = 


lim 

V-^-oo 


1 

TV 


N 


J2 t f i 


(3.7) 


It can also be calculated by dividing the number of function elements n{t) through 
the number of failures per time interval 


MTTF 11 ^ number of functioning elements 
drtf(t) number of failures 

dt time interval 


(3.8) 


Comparison with (3.3) shows that MTTF is the reciprocal to the failure rate 


1 

MTTF = - 
A 


(3.9) 


For constant failure rate it follows from the exponential reliability function 


MTTF = 


J R(t)dr = J e 


-kt 



o 


o 


(3.10) 


The MTTF represents the average operating time of a large number of non-repairable 
elements before a failure happens, for the case that the failure rate is constant. How¬ 
ever, this means that the probability of a system to function at t = MTTF = 1/A 
is 

R(t) = e _1 = 0.37 

Hence, after an operating time of MTTF only 37% probability of correct operation 
results. This underlines that MTTF represents an average life time of a large number 
of components. Table 3.2 and Table 3.3 show examples for failure rates of various 
electronic and mechanical components. 


Table 3.2. Typical failure rates of mechanical and electromechanical elements, [3.19] 


Mechanical elements 

Apr- 1 ] 

Electromechanical elements 

A[h -1 ] 

ball bearing 

1.64- 10~ 6 

actuator, general 

26•10~ 6 

sleeve bearing 

2.38- 10~ 6 

brush, general 

9 -10 -6 

belt 

19.72- KT 6 

cable, general 

1 • 1(T 6 

coupling 

5.54- 10 -6 

electric motor, general 

80 

1 

o 

On 

gear 

4.69- 10 -6 

generator, general 

73•10~ 6 

pump 

43.65- 10~ 6 

regulator, general 

80 

1 

o 

m 

seal 

5.47- KT 6 



valve, hydraulic 

8.83 - KT 6 




Another similar measure like MTTF is the Mean Time Between Failure (MTBF). 
According to [3.19] and [3.17] it is defined for repairable systems where all failed 
units are repaired periodically, see Section 3.3. 
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Table 3.3. Failure rates of electronic/electrical elements [3.14], stationary, 40°C, 60% power. 
: multiplication factor for mobile application 


electronic/electrical elements 

X[h~ l ] 

mobile application 7t£ 

discrete elements 

1-70- 10~ 9 
1-6 - 10“ 9 
36- 360- 10~ 9 
1-100- 10“ 9 
5-150- 10~ 9 

18 

transistor 

diodes 

thyristors 

resistances 

condensators 

analog integrated circuits 

0.3 -0.9- 10~ 6 
20-10~ 6 

4.2 

operation amplifier 
analog switch 

digital integrated circuits 

0.03- 10~ 6 
0.05-0.2- 10~ 6 
2-5- 10“ 6 

4.2 

logic elements 
multiplexer 

8-bit CPU (8080, 6800) 

flight computer (80286) 

10“ 4 ...10“ 3 

[3.18] 


A measure for the unreliability is the failure density 

dQ(t) 1 dn f{t) dR(t) 


q(t) 


dt 


N dt 


dt 


(3.11) 


which becomes for X =const. 

q(t) = Xe~ u (3.12) 

It has the same time history as R(t). Therefore, the failure rate is large for small t 
and small for large t. The probability of failed units during time period At becomes 

An f ^NXe~ Xt At (3.13) 


Example 3.2: Failed units 

If X = 10 -5 [1/h] and N = 10 6 units the initial probability of failed units in the first 
year is 

An f = 10 6 • 1(T 5 • 0.916 • 8760 = 80253 =8.02% 

In the fifth year (after / = 35040 h) the failed units are 

Ah f = 10 6 • 10“ 5 • 0.705 • 8760 « 61731 =6.17% 


□ 

The considered exponential distribution function of reliability over time is the 
simplest one and it has at least been shown to approximate the behavior of elec¬ 
tronic components well. Another frequently used reliability function is the Weibull 
distribution 
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( <-<0 / 

R(t) = e\ , L~> o) (3.14) 

where 

to failure free period (location parameter); 
t l characteristic life time (63 % of distribution, scale parameter); 
f> shape parameter. 

The Weibull function can be adjusted to a wide variety of different distributions, 
compare, e.g. [3.19]. For f = 1 and t = 0 it reduces to the exponential distribution 
and for /) ss 3.55 it approximates the normal distribution. 

The reliability of electronic components depends much on the operating condi¬ 
tions, like environmental conditions and internal load. The failure rate therefore de¬ 
pends on the the external and internal temperature, voltage, current, power, vibration, 
dust, humidity. (In [3.27] it is shown, that the MTTF of IC’s is inverse proportional 
the square of current density and depends exponentially on the temperature.) 
According to [3.14], these influences are considered by factors 

A. = A^TT Qjt J^TtjJt pit £ (3.15) 

according to quality, learning, temperature, power, environment. Therefore, the fail¬ 
ure rate may vary by 2... 3 decades. For standstill 10 % of A/, (basic A) is taken. 
Temperature increase from 40°C to 70°C enlarges the failure rates by factors 2 to 15. 

The reliability of mechanical elements is influenced much by distortion, fatigue, 
fraction, wear and corrosion. For more details see, e.g. [3.4], [3.14], [3.21], [3.20], 
[3.13], 

The required MTTF of components depends of course very much on the product 
or process. For automotive electronics it is, for example, assumed that for reliability 
investigations following numbers are used, [3.9]: time in service: 10 years, operation 
time: 3000 h; average speed: 50 km/h; drive distance: 150000 km; number of rides: 
50000. The required life time of some components follows from [number of rides in 
10 years x demands per ride] and results in: airbag: 50 h; ABS: 5 • 10 5 h; window 
heating: 10 4 h; door locking: 5 • 10 4 h; wipers, lighting: 2.5 • 10 5 h. 

3.1.3 Connected elements 

Reliability analysis generally requires the evaluation of connected elements. This is 
based on reliability networks, representing the kind of connection also called com¬ 
binational modelling. For series connection of elements, see Figure 3.3a, which may 
fail statistical independently from each other, it holds for the overall reliability that 
all elements operate correctly for constant failure rates A, 

m 

R, ot (t ) = [7 R i(0 = e _Er=1 V = e“ W (3.16) 

1 = 1 

This is called the product law of reliability. Hence, the total failure rate and total 
MTTF become 
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m 

^tot = E Aand MTTF, C / 
1=1 


(MTTF,) -1 


Li=i 


(3.17) 


In order to reach a small overall failure rate, all elements should be of similar small 
failure rate A,. Otherwise the largest A,- will dominate and determine A rof . 


(a) 



R 


l 


R „ 


R , 


7? 


m 



Fig. 3.3. Reliability network for: (a) series connection; (b) parallel connection The function 
is either correct (1) or fail (0) 


If the elements are arranged in parallel connection, see Figure 3.3b, i.e. they are 
redundant and the unreliability Qi(t) = 1 — R t (t) describes the probability that one 
element fails, all m,- parallel elements have failed with probability 

m m 

Qto,(t )=n Qi(t )=n (! - ^<(0) (3.i8) 

i=i i=i 

This is the product law of unreliability. 

The reliability of the parallel connection is then 

m m 

R,ot(t )=i - n o - mo )=i - n 0 - e ~ x “) (3 i9) 

i=i i=i 

The failure rate A and the MTTF of parallel connected elements can be deter¬ 
mined as follows. Using (3.9) for constant failure rate and assuming the same A, = A 
for all elements leads to 

OO OO 

MTTF = J R(t)dt = J [l - (l dt (3.20) 

0 0 
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For m = 3 it holds 


R(t) = 1 - (l -e _Af ) 3 

= 1 - 3e~ Xt + 3e~ 2Xt + e~ nt 


and therefore with (3.20) 


1 1 1 11 
MTTF =3- + 3 - + - = - 


This can also be expresses as, [3.2] 


1111 
MTTF ~ k + 2k + 3k~ k 


1 1 

1+ 2 + 3 


11 

6k 


Generalizing leads to 


1 1 
MTTF = - V - 

3i i-—' i 


k i 
1 = 1 


ktnt - 


E m 

1 = 1 i 


(3.21) 

(3.22) 


Hence, MTTF increases for two parallel elements by factor 1.5, for three by 1.833, 
for 4 by 2.083. 

Table 3.4 and Figure 3.4 show reliability numbers for identical connected ele¬ 
ments in either series or parallel connection. The overall reliability decreases con¬ 
siderably with increasing number of elements. The MTTF, for example, is only half 
for two serially connected elements, one third for three elements, etc. However, re¬ 
dundancy by parallel elements leads to an improvement of MTTF by 50 % for two 
elements, 83 % for three elements, etc.. The improvement is initially considerable, 
but the incremental advantage becomes smaller with each parallel component. 

For series-parallel connections the system has to be divided in series and parallel 
connected arrangements, such that the product laws of reliability and unreliability 
can be applied consecutively, see, e.g. [3.22], [3.2], 


3.2 Maintainability 

Definition: 

“Maintenance is understood as an action taken to retain a system in, or return a sys¬ 
tem to its designed operating condition. It extends the useful life of systems, ensures 
the optimum availability of installed equipments or equipment for emergency use.” 

Maintenance is quite often very expensive and may extend the investment cost with 
time. 



40 


3 Reliability, Availability and Maintainability 


Table 3.4. Overall reliability of m connected identical elements for equal failure rate k ; - = 
10 5 [h 1 ], z = 1,2and an operating time of t = 10 4 [h] 



| series connection 

parallel connection | 

elements 

Rtot 

htot 

MTTF ro , 

Rtot 

htot 

MTTF ro/ 

m 

(t = 10 4 h) 

[h- 1 ] 

[h] 

II 

o 

[h- 1 ] 

[h] 

i 

0.905 

10“ 5 

10 5 

0.905 

10~ 5 

10 5 

2 

0.818 

2•10~ 5 

O 

o 

0.9910 

0.667- 10 -5 

1.499- 10 5 

3 

0.741 

3•10~ 5 

0.33 • 10 5 

0.99914 

0.545 • 10 -5 

1.83 • 10 5 

4 

0.670 

4•10 -5 
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Figure 3.5 gives a survey of the different kinds of maintenance activities. Usu¬ 
ally maintenance is a planned action. Preventive or scheduled maintenance begins 
with inspection and includes, e.g. cleaning, adjustment, lubrication at predetermined 
intervals. Is also includes the replacement of minor components, subject to wear, 
before they fail, depending on inspection. Corrective maintenance includes minor 
repairs and planned regular overhauls, i.e. complete replacement of wearing com¬ 
ponents. Unplanned maintenance is performed in an emergency situation to avoid 
immediately drastic failures, shut-downs or losses, or for safety reasons. 
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Fig. 3.4. Mean Time To Failure (MTTF) for m connected elements with equal failure rate X = 
10 _5 [h -1 ]: (a) series connection; (b) parallel connection 


Furtheron, maintenance can be performed while the system is in operation or 
by shutting it down temporarily. The first one holds, for example, for ship engines 
or electric transmission lines. However, usually the system has to be taken out of 
service leading to down times. 
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Fig. 3.5. Kinds of maintenance, [3.2] 


Maintainability is a quality of several features which enable a system to be main¬ 
tained. This includes proper design, required tools, standardized components, group 
subsystems (modularity), aids for trouble-shooting, space parts and logistics. The 
down time is the interval where the system is not in acceptable operation and can 
be separated in diagnostic time, active repair time, logistic time and administrative 
time, 

A measure for maintainability is the probability that a failed system will be re¬ 
paired within a time period Tr. Then, the expected repair time can be stated, called 
MTTR: Mean Time To Repair 


MTTR=£{7*}= lim 
JV-*oo 


1 

N 


E 7 *' 


7=1 


(3.23) 


compare Figure 3.6 
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Fig. 3.6. Time history of the function fct for repairable elements. 7> ; - : time to failure; Tm : 
repair time 
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3.3 Availability 

Definition: 

“The availability is the probability that a considered system will be functioning cor¬ 
rectly at any give time t. It is a time dependent function like reliability, however, for 
systems which are repairable.” 


Reliability describes the probability that the operation is failure-free up to time t. 
Availability considers the probability of the failure-free operation at time t, including 
possible down times for repair. A measure for availability is 


time in operation 

Availability A = - 

total time 

1 


, , MTTR 
1 + MTTF 


MTTF 

MTTF + MTTR 


(3.24) 


A large availability is obtained by large MTTF and small MTTR. For a repairable 
system the Mean Time Between Failure is 


MTBF = MTTF + MTTR 
If MTTR « MTTF, then 

MTBF « MTTF (3.25) 


3.4 Fault management for total life cycles 

The discussed properties of products, like reliability, availability and maintainability 
have to be seen in the context of the overall life cycle. Figure 3.7 shows the vari¬ 
ous steps, beginning with planning and design, through realization/manufacturing, 
installation/commissioning, operation, decommissioning up to waste treatment. If 
recycling, reconditioning and reuse is possible, a certain loop flow for some parts 
arises. The active or functional phases realization, installation and especially oper¬ 
ation are supported by maintenance, repair or fault management. This support in¬ 
creases the quality and performance of the products and their life time. Supervision, 
fault detection and diagnosis help especially for the improvement of these functional 
product phases. 

Fault management describes the actions after the (sudden) appearance of faults to 
maintain the operations. It includes, for example, emergency maintenance or repair, 
use of spare parts, or reconfiguration in the case of static or dynamic redundancy, see 
Chapter 19. 


3.5 Some failure statistics 

In order to obtain some quantitative information on the reliability, and understanding 
of the appearance and kind of faults in technical processes, some published statistics 
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Fig. 3.7. Life cycle of technical products and measures to increase quality, performance and life 
time 


are gathered in the following tables. The basis of the statistical numbers given is, 
of course, rather different. The numbers stem from publications of technical orga¬ 
nizations, service associations, insurances, etc. They are divided in components and 
systems. 

3.5.1 Statistics of components 


Table 3.5. Failure statistics of AC motors (11 ... 50 kW ... 1 MW), [3.24] 


Cause of 

failure 

bearings 

stator 

windings 

external 

equipment 

broken cage 
or rings 

axle 

clutch 

not 

defined 

percentage of 
all failures 

[%] 

51.1 

15.8 

15.6 

4.7 

2.4 

10.4 


Failure rates: low power (< 100 kW): 5% p.a. or X = 5.8 • 10 6 [h '] 

high power (> 100 kW): 10% p.a. or X = 1.16 • 10~ 5 [h -1 ] 


References: [3.24], [3.8], [3.15]. 
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Hence, most frequent failures are through bearings (material fatigue with rent 
generation and pitting, wear, corrosion, plastic deformation or faults during assem¬ 
bling and lack of cooling), stator windings (overheating, loss of isolation, loose iron 
plates), and squirrel-cage rotor (broken rotor bars, unsymmetries, excentricities, res¬ 
onance frequencies, vibrations), see Table 3.5. External equipment is, e.g. power 
electronics with frequency converters, see the summary in [3.26]. 


Table 3.6. Failure statistics of circular pumps, [3.25] 


Cause of 

failure 

sliding 

ring 

seal 

ball 

bear¬ 

ings 

leak¬ 

age 

motor 

drive 

rotor 

oil 

bear¬ 

ings 

clut¬ 

ches 

split 

tubes 

casing 

percentage 
of all 

failures 

[%] 

31 

22 

10 

10 

9 

8 

4 

3 

3 


The numbers in Table 3.6 are based on an inquiry among chemical industry, wa¬ 
ter and wastewater treatment companies. The kind of operation ist: permanent oper¬ 
ation 59%, daily operation 19 %, short operation 22 %. Inspection happens all three 
months, unplanned defects with repair all nine months. Most frequent failures are 
with sliding ring sealing and ball bearings. Causes for operational malfunctions are: 
cavitation, gases in liquids, blockage through closed valves, dry operation through 
missing fluid, wear through erosion (particles), corrosion, ball bearings, split flow, 
deposits and oscillations, see the summary in [3.26]. According to another inquiry 
with the customers, the priorities are: 1. reliability; 2. energy consumptions, 3. no 
leakages; 4. price; 5. noise; 6. control range. 

References: [3.25], [3.16], [3.11], 


Table 3.7. Failure statistics of hydraulic actuators (aircraft components) 


Cause of 

failures 

spool 

valve 

cylinder 

mechanics 

power components 
(pumps) 

others 

percentage of 
all failures 

[%] 

32 

19 

16 

16 

3 

14 


The largest percentage of failures, Table 3.7, arises with 51 % in the spool valve 
which manipulates the oil flow to the cylinder. Failures of the spool valve are erosion 
at the edges (65%), dirt (20%) and external leakage (10%). The failures of the 
valve are internal leakage (10%), insufficient function (35 %) and external leakage 
(15 %). The cylinder shows failures through external leakage at the rod seals (58 %), 
internal leakage (14 %) or broken or cracked rod. 
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Table 3.8. Lifetime costs of sensors and actuators over 15 years of operation. 388 plants. 41 
EUR/h labor cost, [3.7] 


Statement of cost 

pressure 

sensor 

temperature 

sensor 

flow 

sensor 

valve 

EUR 

% 

EUR 

% 

EUR 

% 

EUR 

% 

acquisition 

816 

31.5 

230 

27.1 

1990 

28.7 

832 

30.5 

planning, assembling 

490 

18.9 

388 

45.8 

1173 

16.9 

735 

27.0 

maintenance 

460 

17.7 

153 

18.1 

918 

13.2 

153 

5.6 

repair 

745 

28.7 

0 

0 

2704 

39.0 

534 

33.9 

others 

82 

3.2 

77 

9.0 

153 

2.2 

82 

3.0 

total costs 

2593 

100 

848 

100 

6938 

100 

2336 

100 

maintenance + repair 
acquisitions 

1.5 

0.7 

1.8 

1.3 

overall cost 

3.2 

3.7 

3.5 

3.3 

acquisitions 


The longtime consideration of sensors and actuators for plants of the chemical 
industry 3.8, shows several interesting effects. The total costs of flow sensors are 
highest, followed by pressure sensors and valves. Temperature sensors are cheap¬ 
est. Maintenance and repair costs over 15 years are especially high for the flow and 
pressure sensors and valves. The overall costs are about 3.5 larger than the acquisi¬ 
tion costs. These numbers underline the great significance of early fault detection to 
reduce at least part of costs for maintenance and repair. 

3.5.2 Statistics of systems 

The statistics of the German automobile club ADAC, Table 3.9, are based on 
approximately 20000 - 100000 breakdowns and service helps per year. Most fre¬ 
quent are (and increasing within the last 5 years) failures of the electrics, like battery, 
generator, V-belt, starter blockage, loosened cables or burned fuses. The ignition 
system showed defects because of the immobilizer (theft protection), ECU’s, sparks 


Table 3.9. Breakdown of passenger cars, ADAC Pannenstatistik 1999, [3.1] 


Cause of 

general 

ignition 

motor 

cooling 

fuel 

fuel 

wheels 

failure 

electr. 




system 

system 

injection 

tires 

percentage of 

31.3 

14.1 

12.2 

8.4 

6.6 

6.2 

6.8 

all failures [%] 

(35.1) 

(14.1) 

(10.5) 

(7.2) 

(6.0) 

(5.7) 

(6.9) 

Cause of 

clutch 

chassis 


exhaust 

brake 


suspension 

steering 

failure 

gear 



system 


system 



system 

percentage of 

5.8 

4.2 


2.2 


1.4 


0.6 


0.3 

all failures [%] 

(5.0) 

(4.8) 


— 


— 


— 


— 
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or marten bites. Problems of the engine were broken toothed belts or chains of the 
camshaft, oil pump defects, too less oil and overheating. 

Hydraulic brakes of passenger cars in Germany (1999) are responsible for about 
900 accidents with injuries from about 36000, i.e. 2.5% of all accidents and 18% 
of all accidents due to technical faults (accidents during rain, snow and ice are about 
48.5%), [3.23], [3.5]. The reasons for brake failures are leakage or gas enclosures. 
60 % are due to lacking maintenance (porous flexible brake tubes, corroded brake 
lines, cut seals at cylinder pistons), and 10 % due to wrong assembling and repair. 


Table 3.10. Failure of the water cooling system of passenger cars, ADAC Pannenstatistik 1999, 
[3.1] 


Cause of 

failure 

V-belt 

flexible 

pipes 

(heating 

cooling) 

cylinder 

head 

seals 

water 

pump 

coolant 

liquid 

thermo¬ 

stat 

cooler 

expansion 

vessel 

percentage of 
all failures [%] 

29 

20 

16 

10 

6 

6 

6 


The cooling system shares about 8 % of all vehicle break downs, see Table 3.10. 
Most of the failures are caused by V-belt, the flexible pipes for heating and cooling, 
cylinder heads and water pumps. By leak detection almost 48 % of all failures can 
be detected, e.g. by measuring the coolant liquid level. 


Table 3.11. Damage statistics of components in the chemical industry during assembling and 
commissioning, [3.3], [3.12] 


Failed 

components 

pipes 

columns, tanks, 
cooling towers 

containers 

reactors 

vessels 

filters 

mufflers 

heat 

exchangers 

ovens 

percentage of 
all failures [%] 

10 

9 

18 

9 

14 

14 


Table 3.12. Damage statistics of components in the process industries (refineries, petrochemical 
gas production, terminals, general process industries). Based on 2023 failures, [3.12] 


Failed 

components 

pipes 

tanks 

containers 

reactors 

heat 

exchangers 

ovens, vessels 
heaters 

percentage of 
all failures [%] 

5-22 

5-21 

5-20 

3-20 

4-5 

4-7 


A summary of damages and accidents in the process industries is given in [3.12], 
An extract is shown in Table 3.11 and 3.12. The relative frequency of accidents 
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is especially high in following industries: paper, cellulose, alkali-chlorus, fertilizer, 
chemical (US industries, 1994-1999), with up to 200 accidents per year, e.g. in the 
paper industry. All these statistics are summarized in [3.12] 

Hence, pipes and pipelines are the most frequent cause for damages (go even 
up to 33 % or 46 % due to other sources), followed by tanks, containers and reac¬ 
tors. The reasons for damages by the pipes are material weakening through erosion / 
corrosion or chemical corrosion, especially at elbows and because of unsuitable ma¬ 
terial. Other reasons are overpressure (20 %), corrosion (16%) and human operating 
errors (31%). All human errors are 41 % with a share of wrong maintenance (39 %), 
design (27%) and operation (14%). Also damages by valves are included within 
pipes, with about 4 % to 11 % of pipe damages. 

Tanks and containers cause the next frequent damages, especially in the petro¬ 
chemical industry, for example by overfilling or failing of welding seams. These 
damages of pipes, tanks and containers are the reason of about 30-50 % of all events. 
This means that especially the more simple components and not the more compli¬ 
cated ones increase the overall damages. 

The costs per event are in an average for material and property/break of opera¬ 
tions: petrochemical industry 20/28 Mill US $; refineries 15/17 Mill US $; chemical 
processes 11/15 Mill US; and machines, electronics 5/7 Mill US $. This is based on 
2700 damages since 1984. 


3.6 Problems 

1) For 10000 devices 10 failures are observed each year. Determine the failure rate 
and MTTF. 

2) The mortality rate of a 30 year old man is A = 10~ 3 [ 1/year]. Determine the 
estimated life time as MTTF if the man would stay in same health condition. 

3) Estimate the infant phase, normal phase and ageing phase of a human and an 
automobile. Determine the reasons for failures for all three life phases. 

4) The failure rate of a manufacturing unit be A j = 1 failure per year. Determine 
the overall failure rate and MTTF if 2, 3, 5, 10 units with the same failure rate 
are connected in series. How are these numbers for parallel connection? 

5) Passenger cars have about 300 hrs/year operation time. Given a failure rate of 
A = 10 -3 [1/h], how many can fail for a population of 10 6 cars in the 1st and 
the 5th year? 

6) The hydraulic brake system of passenger cars consists of redundant paths in a 
parallel configuration. The failure rate of one path is A = 10“ 5 1/h. Determine 
the MTTF of the whole brake system. 

7) Determine the availability of a machine tool for MTTF = 1 year and MTTR = 1 
day. 

8) Determine the failure rate A and MTTF for an electric motor with gear for a 
series connection of a switch, cable, motor, gear according to the failure rates of 
Table 3.2 and 3.3. 
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9) One type of trucks has under same driving condition an MTTF = 8 months and 
2 days repair, another type MTTF = 5 months and 1 day repair. Compare the 
availability. 

10) Switching solenoid relays in automobiles show about 5 ppm faulty ones. How 
many cars are effected if they contain 50 relays each? What is the failure rate if 
these faulty relays show up in 1 year or 5 years? 



4 


Safety, Dependability and System Integrity 


For safety-related systems all aspects of reliability, availability, maintainability and 
safety (RAMS) have to be considered because they are relevant for the responsibility 
of the manufacturers and the acceptability of customers. To meet safety requirements 
special procedures were developed in different technical disciplines like railway, air¬ 
craft, space, military and nuclear systems. These procedures are covered by the terms 
system integrity or system dependability. 

The various kinds of safety requirements lead to different levels of integrity of 
safety-related systems, from lowest to highest requirements. In this context “integrity 
means more precisely “safety integrity” with following definition: 

Safety integrity is the probability of a safety-related system satisfactorily per¬ 
forming the required safety functions under all the stated conditions within a stated 
period of time” [4.2]. 

Safety and reliability are generally achieved by a combination of 

• fault avoidance; 

• fault removal; 

• fault tolerance; 

• fault detection and diagnosis; 

• automatic supervision and protection. 

Fault avoidance and removal has to be mainly accomplished during the design 
and testing phase. For investigating the effect of faults on the reliability and safety 
during the design and also for type certification a range of analysis methods were 
developed, such as: 

• reliability analysis; 

• event tree analysis (ETA); 

• fault tree analysis (FTA); 

• failure mode and effect analysis (FMEA); 

• hazard analysis (HA); 

• risk classification. 
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Some of the methods developed in special fields are more or less accepted in 
other fields. These methods are briefly reviewed in this chapter to find an appropriate 
procedure for developing safety-related systems, see also [4.13] and [4.2]. 


4.1 Reliability analysis 

Reliability is usually described as the probability of a component or system to func¬ 
tion correctly over a certain period of time and for certain operating conditions. Un¬ 
reliability, therefore, arises if faults develop or appear which lead to a failure of the 
component or the system. Faults and failures may be random or systematic, [4.2], as 
discussed in Section 3.1, see Table 3.1. Random faults occur at random time subject 
to degradation mechanisms, especially in electronic hardware components. Degra¬ 
dation mechanisms depend on components’ quality, manufacturing tolerances and 
operational stress. Systematic faults or failures are related in a deterministic way to a 
certain cause, like human errors, or in manufacturing processes or operational proce¬ 
dures. Random hardware failures can be predicted with reasonable accuracy, based 
on the experience with large numbers of pieces. However, systematic failures cannot 
be accurately predicted based on statistics. 

The reliability of elements with random faults is described by the reliability func¬ 
tion R{t), or the failure rate A[h _1 ] (e.g. failure per 10 6 hours) or Mean Time To Fail¬ 
ures MTTF [h], see Chapter 3. Based on these measures, the reliability of connected 
elements can be estimated. Series connections deteriorate and parallel (redundant) 
connections improve the reliability. 

Failures of mechanical systems arise through distortion, fatigue and fracture, 
wear and corrosion. The reliability of mechanical systems can be improved a lot 
by oversizing, protection (corrosion) and wear reduction and therefore generally 
need no redundancy. Hydraulic systems are more subject to wear and possible faults 
through the fluid and sealing. Therefore, redundancy plays, e.g. a significant role for 
aircraft. The reliability of electrical systems can be improved by almost all measures. 
The reliability of electronic hardware components depends greatly on the manufac¬ 
turing process, environmental conditions and internal load and failures appear sud¬ 
denly and more randomly. Therefore, the reliability can especially be improved by 
redundancy and protection. To cope with systematic software faults usually only re¬ 
dundancy with diversity helps and maintenance to reduce bugs. Hence, the ways to 
improve reliability are different for the different types of components, see the sum¬ 
mary in Table 4.1, compare [4.9]. This influences the design of safety-related systems 
considerably. 


4.2 Event tree analysis (ETA) and fault tree analysis (FTA) 

The event tree analysis (ETA) begins with the event of a component (a basic fault) 
and progresses this through all the components in normal or fail operational mode 
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Table 4.1. I mprovement of reliability for different components, compare [4.9], ++ very large 
potential, + large potential, 0 small potential, not usable 


improvement of 
reliability 

1 components j 

mechanical 

hydraulic 

electrical 

electronic 

hardware 

software 

oversizing 

+ + 

+ 

+ 

+ 

0 

maintenance 

+ + 

+ + 

+ 

0 

+ 

protection 

+ + 

+ + 

+ 

+ + 

0 

wear reduction 

+ + 

+ 

+ 

0 

0 

redundancy 

0 

+ 

+ 

+ + 

+ 

• static 

0 

+ 

+ 

+ + 

0 

• dynamic 

0 

0 

+ 

+ + 

+ 

• diversity 

0 

0 

0 

+ + 

++ 


and determines the consequence of the system’s function, i.e. either active or inac¬ 
tive, Figure 4.1. Therefore, the possible expected (normal) and failed conditions of 
all connected components have to be considered, including logic AND and OR oper¬ 
ations. As each event causes a branch in the diagram, a tree with N events will have 
2 n branches. Therefore, the ETA results in very large trees because the normal as 
well as the failed functions are considered. Note that only binary states (yes or no) 
are taken into account. 

Th e fault tree analysis (FTA) proceeds in the reverse direction of ETA. It begins 
with the failure of the system (as top event) and determines the possible causes in the 
components respective basic components failures including logic operations, Figure 
4.2. As only the failed events are taken into account, the tree becomes smaller than 
for ETA. Also here only binary states are considered. 


4.3 Failure mode and effects analysis (FMEA) 

The FMEA is a formalized method to consider all components, their functions, fail¬ 
ure modes and causal system failures. The method was developed in 1960s and used 
by NASA for the Apollo project, later for aerospace and nuclear power stations, 
[4.1]. Now it is a standard method also in the automotive industry [4.14], [4.3]. 

FMEA starts with listing all the components, their operating modes, and their 
failure modes. It then considers possible causes for each failure mode and describes 
their effects for the unit under consideration and for the complete system. Fur- 
theron, counter actions are listed. Usually only single failures are considered. Be¬ 
cause FMEA worksheets are used, it is a formalized method. Table 4.2. The method 
is used to detect weak spots of the design in early and later stages of development. 

The procedure results in a tree-like network structure with binary states, similar 
as an event tree, however, without normal operating modes and without logic inter¬ 
connection. Therefore, it does not blow up like ETA. The strength of FMEA is its 
completeness. However, it may result in a very time-consuming procedure. 
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Fig. 4.2. Fault tree analysis (FTA) for six components with parallel connections for 1, 2 and 5, 
6 and serial connection for 3, 5 (deductive analysis from special effects to causes) 
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Table 4.2. FMEA worksheet 


Component 

Failure 

mode 

Failure 

causes 

Failure 

effect on 

unit 

Failure 

effect on 
system 

Counter 

action 








FMEA can favorably be combined with FTA, because it yields the possible sys¬ 
tem failures, which are the inputs of FTA. Therefore, FMEA and FTA complement 
each other. 

Failure modes, effects and criticality analysis (FMECA) is an extension of 
FMEA. The importance of each component failure is taken into account by con¬ 
sidering its probability of occurrence and effects on the system, i.e. by expressing 
the risk of operation. 

A Failure Risk Priority Number (FRPN) can be stated to show the criticality of 
different failures 


FRPN = Ax Bx E (4.1) 

where 

• A: probability of failure occurrence, 

• B: effects on the system, 

• E: detection rate. 

For A, B, E ranking numbers between 1 and 10 are used, as shown in Table 4.3 
and 4.4, [4.7], [4.11], [4.14], 


Table 4.3. Ranking numbers: A for the evaluation of the possibility of failure occurrence for 
passenger cars, [4.14], The failure rate is calculated with the assumption of 300 operating 
hours per year 


Ranking 
number A 

Evaluation 

Failures/year v [ppm] 

Failure rate A.[^] 

10 

very high 

500000 

1.67- 10 -1 

9 

100000 

0.33- 10 -1 

8 

high 

50000 

1.67 - 10 —2 

7 

10000 

0.33- 10~ 2 

6 


5000 

1.67- 10~ 3 

5 

medium 

2000 

0.67- 10~ 3 

4 


1000 

0.33- 10~ 3 

3 

low 

100 

0.33- 10 -4 

2 

50 

1.67- 10~ 5 

1 

very low 

1 

0.33- 10 -6 
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Table 4.4. Ranking number B for the evaluation of the effects of failures for vehicles 


Ranking 
number B 

Fundamental 

effects 

Effects for 
systems 

Effects on 
occupants 

Effects for 

vehicles 

10,9 

catastrophic 

system damage 

death 

destruction, fire 

8,7 

major 

system shut down 

serious injury 

serious damage 

6,5 

severe 

subsystem down 

injury 

failure 

4,3 

minor 

partly down 

slight injury 

minor failure 

2, 1 

no effect 

no effect 

no injury 

no effect 


4.4 Hazard-analysis (HA) 

Hazards are undesirable system conditions with the potential to cause or to con¬ 
tribute to a damage or accident. Hazard-analysis is therefore a basic procedure to 
provide awareness and information of the systems safety-critical components and 
states, [4.6]. 

Based on the FMEA all safety-critical failures are extracted with the goal to 
identify hazards (potential sources of harm, i.e. physical injury or damage), their 
effects on the system and ways to minimize or avoid them. This can be represented 
in a similar worksheet like the one for FMEA. The results are given in binary state, 
i.e. yes or no. An example is given for an electromechanical brake in [4.4]. 

Once hazards are identified, their causes can be analyzed by proceeding with a 
fault tree analysis (FTA) starting with the hazard as top event. 

A more detailed analysis is the Hazard and Operatability Studies (HAZOP) de¬ 
veloped in the 1960s in the chemical industry, [4.13]. Here, the effects of deviations 
from normal operating conditions are investigated in a systematic way. For example, 
parametric changes and out-of-range values are considered if they could result in a 
hazard. Safety features to compensate hazards are also taken into account. HAZOP 
starts with identifying the interconnection of components, looks at the flow of mate¬ 
rials or signals and defines attributes by using a limited number of guide words like 
“no, more, less, reverse, early, late”. This means that not only binary values are con¬ 
sidered but ranges of values in the sense of a stepwise classification. By analyzing 
causes and consequences for the system, possible hazards are identified. The results 
can then be stated in a fault tree. 

It is typical for these reliability and safety analysis procedures that they are com¬ 
piled by a team of engineers in consecutive sessions with expertise covering the 
whole system. These analysis procedures generally have to be repeated at several 
times during the project development. 


4.5 Risk classification 


As hazards represent situations of potential danger they may lead to an accident, 
which is an unintended event causing injury, death, environmental or material dam¬ 
age. In order to judge the relative importance of hazards and their possible accept- 
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Table 4.5. Accident probability ranges for military systems, [4.13] 


Accident frequency 

Occurrences during operational life considering all instances of the 
system 

frequent 

likely to be continually experienced 

probable 

likely to occur often 

occasional 

likely to occur several times 

remote 

likely to occur some time 

improbable 

unlikely, but may exceptionally occur 

incredible 

extremely unlikely that the event will occur at all 


ability the associated risk has to be considered. Risk is herewith determined by the 
combination of the probability (frequency) and its severity (consequence) of a haz¬ 
ard. The risk can be classified by using either qualitative or quantitative methods, 
[4.13], 

The probability of hazards is, e.g. classified in six levels, Table 4.5, [4.2], Table 
4.6 shows probability numbers for aircraft systems. The severity is subdivided into 
four classes from catastrophic to negligible. The risk of hazards is also classified 
into four classes, from intolerable to negligible risk. Table 4.7. The qualitative risk 
classification then follows by considering both, the probability and severity, Table 
4.8. 


Table 4.6. Aircraft systems hazard probabilities, [4.10], [4.13] 


System 

criticality 

Catagory 
of effect 

Effect on 
aircraft 

Qualitative 

probability 

term 

critical 

loss of 
aircraft 

catastropic 

extremely 

improbable 

extremely 

improbable 

essential 

large safety 
reduction 

hazardous 

improbable 

extremely 

remote 

significant 

safety 

reduction 

major 

remote 

non- 

essential 

operating 

limitations 

emergency 

procedure 

minor 

probable 

resonably 

probable 

normal or 
nuisance 

frequent 
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Quantitative risk measures can, e.g. be obtained by calculating a hazard risk 
number 
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Table 4.7. Interpretation of risk classes, [4.2] 


Risk class 

Interpretation 

class I 

intolerable risk 

class II 

undesirable risk, and tolerable only if risk reduction is impractical or if the costs 
are greatly disproportional to the improvement gained 

class III 

tolerable risk if the cost of the reduction would exceed the improvement gained 

class IV 

negligible risk 


Table 4.8. Risk classification of hazards or accidents, [4.2] 


Frequency 

| Consequence 

catastrophic 

critical 

marginal 

negligible 

frequent 

I 

I 

I 

II 

probable 

I 

I 

II 

III 

occasional 

I 

II 

III 

III 

remote 

II 

III 

III 

IV 

improbable 

III 

III 

IV 

IV 

incredible 

IV 

IV 

IV 

IV 


R = F h x C (4.2) 

where Fh is the frequency (probability) of the hazard and C the consequence (sever¬ 
ity), [4.2], If the risk has to be reduced this can be accomplished by reducing both 
risk parameters, the probability and severity of hazards. The assignment of risk mea¬ 
sures depends a lot on the technical area, like, e.g. nuclear, aircraft or heating and 
ventilating systems. Therefore, there exist industry specific standards. 

Based on the probability of dangerous failures (high, constant value of C) the 
IEC has proposed four safety integrity levels (SIL) for electronic programmable sys¬ 
tems, Table 4.9 Safety integrity is to be seen as a measure of the likelihood of the 
safety system correctly performing its tasks. 


Table 4.9. Safety integrity levels (SIL) for safety related electronic programmable systems and 
dangerous failures 


Safety integrity 
level (SIL) 

Failure probability 
per hour 

Operating hours 
per failure 

Operating years 
per failure 

4 

10“ 9 ...10“ 8 

10 8 ...10 9 

10 4 ...10 5 

3 

1 

o 

CO 

1 

O 

CO 

O 

r- 

o 

10 3 ...10 4 

2 

SO 

1 

O 

r-> 

1 

o 

o 

vD 

o 

10 2 ...10 3 

1 

i(r 6 ...i(r 5 

10 5 ...10 6 

10...100 


If the effect of a safety-critical failure depends on the operational state, the risk 
number can be modified by the frequency of the operational state Fq p 


Rop = Fh x Fqp x C 


(4.3) 
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see, e.g. [4.8], [4.9]. This applies, e.g. for vehicles with operational states like ac¬ 
celeration, cruising with high or low speed, braking with engine or brake, cornering 
with or without acceleration or deceleration. 


4.6 Integrated reliability and safety design 

The existing methods for analyzing the reliability and safety can now be combined 
appropriately. Figure 4.3 shows an overall scheme. The FMEA identifies all compo¬ 
nents, failures, causes and effects. The single failures proceed to a FTA to determine 
the causes and their logic interconnection on a component level. The failure causes 
are then used to design the overall reliability. Remaining failures which cannot be 
avoided are then classified and determine the maintenance procedure. 

Based on the FMEA the hazard analysis extracts safety critical failures. Their 
presentation in a (reduced) fault tree determines the causes with logic interconnec¬ 
tions, [4.12], i.e. dangerous faults leading to hazards. Based on this, the safety system 
at lower levels can be designed. Remaining dangerous failures then undergo a risk 
classification and determine the supervision and safety methods at higher levels to 
reduce the risk to an acceptable measure. 

In addition, ways of fault tolerance can be implemented at component and unit 
level to improve both, reliability and safety. Figure 4.3 indicates the integrated relia¬ 
bility and safety procedure during the design and testing phases. 

The unavoidable failures have to be covered by maintenance and on-line supervi¬ 
sion and safety methods during operation , including fault tolerance, protection and 
supervision with fault detection and diagnosis and appropriate safety actions. These 
methods are discussed in the following chapters. 


4.7 Problems 

1) What are the steps for achieving a system with high safety integrity? Which steps 
have to be performed during design and which ones during operation? 

2) Which methods are known to investigate the effect of faults? 

3) Give some examples for systematic faults and random faults. 

4) What kind of faults are typical for mechanical and electronic components and 
for software? 

5) Draw an event tree and a fault tree for an electromagnetic valve and a DC motor. 

6) What are the differences between an event tree and a FMEA worksheet? 

7) What are the differences between reliability analysis methods and hazard analy¬ 
sis? 

8) Hazard risk numbers R op according to (4.3) have to be calculated for an air¬ 
craft with an automatic control system and a passenger car with a steer-by-wire 
control system and the assumption that a total breakdown of the control systems 
causes a major effect. The assumed parameters are: 
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Fig. 4.3. Integrated design procedures for system reliability and safety to result in high system 
integrity, [4.5] 


• one passenger aircraft with Fh = 10“ s [1/h ];C = 200 people; F op = 5000 
h/year; 

• one passenger car with Fh = 10 -6 [ 1/h]; C = 4 people; F op = 300 h/year. 
What safety integrity levels (SIL) are recommended for one passenger aircraft 
and one passenger car if the number of injured people is 1 in 100 years or 1000 
years? 

9) How does the hazard risk number R op of Problem 8 change if total fleets of 
aircraft and cars are considered with fleet size n a j rcra ft = 10 3 ; n ca rs = 10 6 ? 






Part II 


Fault-Detection Methods 
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Process Models and Fault Modelling 


Model-based methods of fault detection use the relations between several measured 
variables to extract information on possible changes caused by faults. These rela¬ 
tions are mostly analytical relations in form of process model equations but can also 
be causalities in form of, e.g. if-then rules. Figure 5.1 shows a general scheme for 
process model-based fault detection. The relations between the measured input sig¬ 
nals U and output signals Y are represented by a mathematical process model. Fault- 
detection methods then extract special features, like parameters 0 , state variables 
x or residuals r. By comparing these observed features with their nominal values, 
applying methods of change detection, analytical symptoms s are generated. 

These symptoms are the basis for fault diagnosis. For the application of model- 
based fault-detection methods the process configurations according to Figure 5.2 
have to be distinguished. With regard to the inherent dependencies used for fault de¬ 
tection and the possibilities for distinguishing between different faults, the situation 
improves from a to b or c by the availability of more measurements, as will be shown 
later. The applied process models can be classified according to 

• continuous models; 

• discrete-event models; 

both in 

• continuous time; 

• discrete time. 

The continuous models are, in general, equation-based with further subclasses as 
linear, nonlinear, time-variant. Discrete event models are, e.g. finite state machines, 
functional diagrams or Petri-nets. 

In the following some basic continuous models are considered. It is of special 
importance for the development of fault-detection methods how faults can be repre¬ 
sented within these process models. 
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faults 




(a) (b) 



(c) 


(d) 


Fig. 5.2. Process configuration for model-based fault detection: (a) SISO (single-input single¬ 
output); (b) SISO with intermediate measurements; (c) SIMO (single-input multi-output); (d) 
MIMO (multi-input multi-output) 
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5.1 Fault models 

A suitable modelling of faults is important for the right functioning of fault-detection 
methods. A realistic approach presupposes the understanding between the real phys¬ 
ical faults and their effect on the mathematical process models. This can usually 
only be provided by the inspection of the considered real process, the understanding 
of the physics and a fault-symptom-tree analysis. There are many reasons for the 
appearance of faults. They stem for examples form 

1) wrong design, wrong assembling; 

2) wrong operation, missing maintenance; 

3) ageing, corrosion, wear during normal operation. 

With regard to the operation phase they may be already present or they may appear 
suddenly with a small or large size or in steps or gradually like a drift. They can be 
considered as deterministic faults. Especially disadvantageous are usually intermit¬ 
tent faults which appear as stochastic faults. 

5.1.1 Basic fault models 

A fault was defined in Chapter 2 as an unpermitted deviation of at least one char¬ 
acteristic property, called feature, from an usual condition. The feature can be any 
physical quantity. If the quantity is part of a physical law Y(t) = g [U(t), x(f), 0] 
in the form of an equation, and measurements of U(t) and Y(t) are available, the 
feature expresses itself either as input variable U(t), output variable Y(t), state vari¬ 
able Xj (t) (time-dependent function) or parameter 9, (usually constant value). Hence, 
faults may appear as changes of signals or parameters. The time dependency of faults 
may show up as, compare Figure 5.3, 

• abrupt fault (stepwise); 

• incipient fault (drift-like); 

• intermittent fault (with interrupts). 

With regard to the corresponding signal flow diagrams, see Figure 5.4, the 
changes of signals are additive faults, because a variable Y u (t) is changed by an 
addition of f(t) 

Y(t) = Y u (t) + f(t) (5.1) 

and the changes of parameters are multiplicative faults, because another variable 
U(t) is multiplied by fit) 

Y(t) = (a + A a(t))U(t) = aU(t) + A a(t)U(t) 

= Y u (t ) + f(t)U(t) (5.2) 

For the additive fault the detectable change A Y(t) of the variable is independent on 
any other signal 


A Y(t) = fit) 


(5.3) 
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Fig. 5.3. Time dependency of faults: (a) abrupt; (b) incipient; (c) intermittent 


(Instead of the output signal Y{t), the input signal U(t) or a state variable x,(t) can 
be influenced). 

However, for the multiplicative fault, the detectable change of the output A Y(t) 
depends on the input signal U(t) 

A Y(t) = f(t)U(t) (5.4) 

This means, if the signal Y(t) can be measured, the additive fault is detectable for 
any Y u (t) but the multiplicative fault can only be detected if U(t) 0. The size of 
the change A Y(t) then depends on the size of U(t). 


m=AY(t) 

Uf)*m=Yu(t)+M 


fit)=Aa(t) 


i 


-► 

a 

m 



Y(t)=(a+f(t))U(t) 


(a) 


(b) 


Fig. 5.4. Basic fault models: (a) additive fault for an output signal; (b) multiplicative fault 


5.1.2 Examples for fault models 

Because the kind of faults and their modelling depends primarily on the actual 
process, some typical examples are considered. 

Example 5.1: Sensor faults 

Sensors and measurement systems are dynamic transfer elements for which only the 
output Y(t), the measurement variable is accessible. Without additional calibration 





5.1 Fault models 


65 


equipment, the real physical input Yo(t) is unknown. The static behavior of a sensor 
may be linear 

Y(t) = Co + C\^o(t) 

or nonlinear 

Y(t) = Co + dToCO + c 2 Yq (t) + ... 

The dynamic behavior of a sensor can for small changes frequently be approximated 
by a linear model with transfer function 

BAs) 

A s (s) 


1 + Tis 


GAs) 


AY(s) y(s) 


ATo(-s) ToO) 
and in simple cases by a first order lag 

K. 

a. — s 


with the gain 


K s — Ci + 2 c 2 + .. 


The sensor output is now usually influenced by several kind of disturbances. Ac¬ 
cording to [5.19], one can distinguish between external and internal disturbances, 
see Figure 5.5. 

external disturbances 


superimposing 
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oa y. 
?> 




deforming 
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&, | real values 
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Y 


measured 
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internal disturbances 


Fig. 5.5. Blockdiagram of a sensor or measurement equipment influenced by disturbances 


External disturbances are generated by the sensor environment. Frequently, su¬ 
perimposed disturbances z\(t) arise, e.g. by induced electromagnetic influence 

Y(t) = Ci[yo(f) + _i(0] + Co 


The size of the sensor fault 
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A Y(t) = cizi(0 

is independent on the true value To, see Figure 5.6a. Environmental changes z 2 (t) 
like temperature, fluid flow velocity, contamination deform the transfer behavior by 
changing the gain and time constant 

[T\ + A7’i(z 2 )]j(0 + y(t) = [K s + AK s (z 2 )\y Q (t) 

This leads to a static and dynamic deviation 

A y(t) = —AT\(z 2 )y(t) + AK s (z 2 )y Q (t) 

and results in parametric faults. 




Fig. 5.6. Effect of different faults on the static sensor reading Y(t) for the measured value 
Yo(0 : (a) zero offset; (b) change of gain; (c) change of response value; (d) change of hysteresis 


Internal disturbances are caused by the sensor itself. Typical faults are changes in 
the power supply, resistances, capacitances, inductances, backlash or friction. Then, 
as well the static as dynamic sensor behavior can change. 

Figure 5.6 summarizes the impact of external and internal disturbances on the 
sensor output signals in the form of static characteristics. Mainly three different 
sensor faults can be distinguished: 

a) constant offset AY 

b) change of gain A K s resulting in value-dependent offset AT (To) 

c) direction-dependent offset (hysteresis) AF(sign To) 
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Hence, the cases a and b can be modelled as additive faults, c as direction depen¬ 
dent additive fault. 

Some faults result not only in a deterministic change but change also the stochas¬ 
tic character of the sensor output. Therefore, also the mean value and variance of the 
output are fault models 


E{Y(t)} and 0-2 = E 


{ 


Y(f) - Y 



A change of the dynamic behavior of the sensors, e.g. by a time constant change 
ATi, has to be modelled together with the process and treated as a multiplicative 
fault. 

□ 


Example 5.2: Actuator faults 

An electromagnetic proportional acting flow valve is considered, see Figure 5.7. The 
static behavior depends on the characteristics of the current I = f\ ( U ), the position 
z = fiU ), the valve area Ap = f^{i) and the mass flow in = fn{Ap). The overall 
static behavior then results from m = f (U). Assuming a polynomial characteristic, 
the overall function can be approximated, e.g. by 

m = Co + C\U + coU 2 

assuming that the pressure p\ does not change, p\ = const. 

Additive faults are those who arise in a parallel shift of the characteristic, which 
is lumped in the constant Co. The corresponding physical faults are, e.g. offset in 
U, change of spring counter position or spring pretension, change of flow area Ap 
in zero position Ap(z = 0). Dry (Coulomb) friction can be modelled as direction 
dependent offset of the input 


A U = U co sign U 

and backlash as 

A U = U b jo sign U 

see [5.11]. 

Multiplicative faults are contained in the parameters c\ and Ci- Examples are: 
change of flux in coils (e.g. resistance of coil wires), air gap in electromagnet, fric¬ 
tion at shaft or magnet, gain of power electronics, change of valve piston geometry, 
change of pressure difference A p = p\ — pi. Friction of the shaft or magnet or 
backlash can also be modelled as direction dependent parameter. (Corresponding 
equations for the components follow, e.g. from [5.11]). Hence, many faults belong 
to the class of multiplicative faults. 

□ 
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Fig. 5.7. Electromagnetic flow valve 
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Example 5.3: Electrical cable connection 

An electrical cable connection is considered with total resistance (R \ + Ri) which 
supplies a consumer resistance R 4 , see Figure 5.8. A fault now happens in form 
of a shortcut with resistance R 3 between the resistances R\ and R 2 , resulting in 
a shortcut current hit) = fi(t). This shortcut can now be modelled as changing 
resistance R 3 e ( 0 ,oo). 
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Fig. 5.8. Electrical cable connection with consumer R 4 and shortcut fault through 
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a) Measurement of hit) and / 4 (f) 


If the fault is interpreted as a changing resistance R 3 = Jr, then from Kirchhoff’s 
node and mesh law follows 


hit) = 


1 

1 + + ^4) 


hit) = grif R )hit) 


and the fault becomes a parametric fault because the “gain” grifit) changes. If, 
however, only the current balance equation (mesh law) 


hit) = hit)-hit) 


is applied, the shortcut-current sums to appear as additive fault. But then the offset 
f 1 (t) depends on the applied input current I\ (t), as 

hit) = hit) ~ hit) = [gi if R ) - l ]/!(0 

This means that the size of the “additive” fault depends on the input, which is not in 
agreement with its definition. Only for constant operation point /, the shortcut can 
be considered as additive fault on hit)- 


b) Measurement of Uiit) and hit) 


By use of Kirchhoff’s law, one obtains for the current without shortcut 

1 . 1 


hit) = hit) 


R 1 + R2 + R- 


it) 


R 


-Ur it) 


tot l 


If a shortcut with resistance R 3 = Jr happens between R j and Rj, the consumer 
current becomes 


hit) = 


1 


(1 + + ^ 4 ) + R 


Urit) = 


1 


R 


ton 


-Urit) = giifhUrit) 


Hence, also in this equation a shortcut expresses itself as a multiplicative fault. 
By parameter estimation R to ti can be determined and the size of the fault Jr can be 
calculated if R\, R 2 and R 4 are known. (For /r -> 00 there is no shortcut). 

□ 


Example 5.4: Pipeline 

A liquid flow through a pipeline, Figure 5.9, is considered. It is assumed that the 
flow at the input m \ and the output m 4 and the pressure pr and /? 4 relative to the 
absolute air pressure can be measured. A fault now happens in form of a leak with 
flow m^it) = fnit ), compare [5.20], [5.9]. (According to the outflow equation of 
an orifice the leak flow m 3 is proportional to the leak area A p and hlh)- 
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(b) 

Fig. 5.9. Pipeline with a leak m 3 : (a) scheme with measurements; (b) scheme with analogy to 
the electrical circuit of Figure 5.8 

a) Measurement of m\(t) and 014 (f) 

The mass balance equation leads to 

024 (f) = m 1 (t) - f m (t) 

Therefore, for a constant operation point m \ = const., the leak flow shows up as an 
additive fault. 

Now an interconnection of resistance can be made analog to the electrical circuit 
in Figure 5.9, assuming, simplifying, a laminar flow through the pipeline and its 
resistances and therefore a linear resistance law like 

Pa - P\ = {cr\ + c R2 )m 2 

in the case of no leak. If the leak hole in the pipeline has a resistance cr 3 = f R one 
obtains analog to the electrical short cut 

(0 = 7 - 7 —— --mi (0 = g\(fR)I\(t) 

l + + cra) 

The fault is therefore a multiplicative fault by considering arbitrary operating points 
m 1 . 

b) Measurement of pi(t) and 014 (f) 

Analog to the electrical shortcut-example, one obtains without leak 

m A {t) = --—-— - pi(t) = - Pi (t) 

CR 1 + Cr 2 + Cra Crt 1 
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and with a leak of resistance cr 3 = Jr 
' (1 + j^-)(cr2 + Cra) + Cffi 


Pi(t) = 


- Pl(t) = g 2 (fs)Pl(t) 

CRT 2 


Herewith, the leak appears as a multiplicative fault. As in the electrical example, 
the leak appears only with the balance equation and constant operation point as an 
additive fault and with the more explicit modelling in form of a resistance law for 
the leak hole and the whole operation range, as a multiplicative fault. 

Another type of fault is clogging. Then the resistance, e.g. cr\ becomes larger 
and also crti • Hence, also clogging is a multiplicative fault. 

□ 


These examples show that the kind of fault, additive or multiplicative, depends 
on the used models and the physical nature of the fault. Several sensor faults can be 
modelled as additive faults. Actuators show more multiplicative than additive faults. 
Processes may show additive faults, if only balance equations are used. If, however, 
constitutive and phenomenological laws are applied to model processes in more de¬ 
tail, multiplicative faults appear frequently. In summary modelling of faults requires 
the consideration of the underlying physical effects. 


5.2 Process models 

5.2.1 Theoretical and experimental modelling 

Mathematical models of dynamic processes are primarily obtained by either theoret¬ 
ical/physical modelling or experimentally by identification methods. For theoretical 
modelling , also called theoretical analysis or modelling by first principles, the model 
is set up on the basis of mathematically formulated laws of nature. For this, first 
the process elements are considered. By combining their models, one obtains mod¬ 
els of subprocesses and overall processes. The theoretical modelling always begins 
with simplifying assumptions about the process, which simplifies the calculations or 
enables them at all with a tolerable expenditure. One can distinguish the following 
types of basic equations: 

( 1 ) balance equations for stored masses, energies and impulses; 

( 2 ) constitutive equations (physical-chemical state equations) of special elements; 

(3) phenomenological equations, if irreversible processes (equalizing processes) 
take place (e.g. equations for thermal conduction, diffusion or chemical reac¬ 
tion); 

(4) entropy balance equations, if several irreversible processes take place (if not 
already considered by (3); 

(5) connection equations (describe the interconnection of the process elements). 

For distributed parameter systems, the dependency on the space and time has 
to be considered. This usually leads to partial differential equations. If the space 
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dependency is negligible, the systems can be considered with lumped parameters. 
These are described by ordinary differential equations as a function of time. 

By summarizing the basic equations of all process elements, one receives a sys¬ 
tem of ordinary and/or partial differential equations of the process. This leads to a 
theoretical process model with a certain structure and certain parameters, if it can be 
solved explicitly. Frequently, this model is extensive and complicated, so it must be 
simplified for further applications. 

The simplifications are made by linearization, reduction of the model order or 
approximation of systems with distributed parameters by lumped parameters when 
limiting on fixed locations. The first steps of these simplifications can be already 
made by simplifying assumptions while stating the basic equations. 

But also if the set of equations cannot be solved explicitly, the individual equa¬ 
tions supply important hints for the model structure. So, e.g. balance equations are 
always linear and some phenomenological equations are linear in wide areas. The 
constitutive equations often introduce nonlinear relations. 

During experimental modelling, which is called identification, one obtains the 
mathematical model of a process from measurements. Flere, one always proceeds 
from a priori knowledge, which was gained, e.g. from the theoretical analysis or from 
preceding measurements. Then, input and output signals are measured and evaluated 
by means of identification methods in such a way that the relation between the input 
and output signal is expressed in a mathematical model. The input signals can be 
naturally operating signals (occurring in the system) or artificially introduced test 
signals. Depending upon the application purpose, one can use identification methods 
for parametric or nonparametric models. The result of the identification then is an 
experimental model. A detailed description of the different techniques can be found, 
e.g. in [5.3], [5.13] and [5.16], 

The theoretical and the experimental model can be compared, provided both 
types of modelling can be realized. If both models do not agree, then one can con¬ 
clude from the type and size of the differences which particular steps of the theoreti¬ 
cal or experimental modelling have to be corrected. 

Theoretical and experimental modelling thus mutually complete themselves. The 
theoretical model contains the functional description between the physical data of the 
process and its parameters. Therefore, one will use this model, e.g. if the process is to 
be favorably designed with regard to dynamical behavior or if the process behavior 
has to be simulated before construction. The experimental model on the other hand, 
contains parameters as numeric values whose functional relation with the physical 
basic data of the process remains unknown. In many cases, the real dynamic behavior 
can be described more exactly or it can be determined at smaller expenditure by 
experimentally obtained models, which, e.g. is better suited to the adjustment of a 
feedback controller, the prediction of signals or for fault detection. 

Theoretical models are also called “white-box models” and experimental mod¬ 
els are called “black-box models”. However, in practical cases frequently exist some 
types of models which are in between these two types, Figure 5.10. If, for example, 
the physical laws are known, but the parameters not and have to be determined ex¬ 
perimentally, the resulting models can be called “light-grey models”. If only physical 
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if-then rules are known and the the parameters have to be determined by experiments 
“dark-grey models” result. 



Fig. 5.10. Different kind of mathematical process models 


The procedure of the theoretical modelling of technical processes is treated ex¬ 
tensively in [5.11]. Despite the large variety of existing process elements, it is possi¬ 
ble to reach a certain systematic. This is supported by many similarities and analogies 
between not only the mechanical and electrical, but also thermal, thermodynamic and 
chemical processes. 

5.2.2 Static process models 

The static behavior (steady-state) of processes is usually described by graphical rep¬ 
resented characteristic curves, which are obtained either experimentally or by calcu¬ 
lations from static analytical process models. In many cases they are expressed or 
approximated by polynomials like 

Y = Po + P\U + f$ 2 U 2 + ... + P q U q 

Y = ir T s e s 

of = [PoPlPl ■■ -Pq\ 

f T s =[\UU 2 ... U q ] (5.5) 

Changes of Pq are additive faults and changes of /),- i = 1 .... ,q are multiplicative 
faults. Input signal faults fjj and output signal faults fy are additive faults, compare 
Figure 5.11. (It holds here that fy = A/to). The parameters pi frequently depend on 
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different physical process parameters pj, see Section 5.2.3 and examples, for, e.g. 
characteristics of flow valves or the circulation pump, Chapter 21. 



fy 


i y 


Fig. 5.11. Nonlinear static process models with parametric faults A/?,(;' 
additive input signal faults fu and output signal faults fy 


1 . q) and 


5.2.3 Linear dynamic process models 

Compared to the static behavior, the dynamic process behavior contains additional 
information on the process with regard to changes due to faults. This holds because 
mass, energy or momentum storages are excited by time-varying input signals and as 
well relevant internal parameters as internal state variables provide information on 
changes caused by faults, [5.8]. 

As introduction into this topic a spring-mass-damper-system example is consid¬ 
ered. 


Example 5.5: Spring-mass-damper-system 


A mechanical oscillator with a serial connection of spring, mass and damper or par¬ 
allel connection of spring and damper leads to the differential equation 

mz(t) + d:(t) + cz(t) = AT’i(i) 


where A F\ is the excitation force at the mass and r is the deviation of the mass from 
a steady-state value. 

The corresponding transfer function is obtained by Laplace-transformation 


G z f(s) 


z (s) = 1 = C 

AT’i(.v) ms 2 + ds + c ^-s 2 + - s + 1 

K K K 


\s 2 + ,v + 1 

o>q °>0 


T 2 s 2 + T 1 s+ 1 a 2 s 2 + a l s+ 1 


with K = \/c\a\ = d/c and a 2 = m/c. The static behavior reduces to 



z = 
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and therefore only information on the spring constant c is obtained. However, for 
dynamic excitation the behavior is described by three parameters: m, c and d. Hence, 
the two parameters mass m and damping d (or natural frequency coq and damping 
ratio D) give additional information on the process condition by considering the 
dynamic behavior. 

□ 


a) Continuous-time dynamic process models 


The general description of a dynamically excited process with lumped parameters 
follows by ordinary differential equations which generally are nonlinear. By consid¬ 
ering small deviations around an operating point (Too I Loo) the input/output behavior 
can usually be simplified to a linear ordinary differential equation 


y{t) + a x y w (t) + ... + a n y (n \t) 

= b 0 u(t ) + b\ii^(t) + ... + b m u^ m \t) (5.6) 

y(t) = Y(t ) - Too; u(t) = U(t ) - U 0 o (5.7) 

where = d n y(t)/dt n are derivatives. 

This linear process model can also be written in vector form 

y(t) = f T (t)0 (5.8) 

0 T = [a x ...a n b 0 ...b m ] (5.9) 

f T (t) = [-y (1) (?)... - y (n) u{t)... u (m \t)] (5.10) 


The corresponding transfer function becomes through Laplace transformation 


G P (s) 


u(s) 


B(s ) _ bo + b\s + ... + b m s m 
A (.s) 1 + a\s + ... + a n s n 


(5.11) 


As shown in Figure 5.12 an input signal fault f u and an output signal fault f y are 
additive faults and the parameter faults Ac/,-, Abj are multiplicative faults. Additive 
input faults /„ and output faults f y lead to changes of the output signal 


AyO) = fy(s) 


(5.12) 

(5.13) 


compare the time histories in Figure 5.13. Herewith, in both cases deviations of the 
output result despite the fact that the input U ( t ) is constant. To discuss the influence 
of parameter faults, first an example is considered. 
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Fig. 5.12. Linearized dynamic process model with parametric faults Aai,Abj and additive 
faults f u and f y 
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Fig. 5.13. Responses of a first order system with constant input U =const. to stepwise: (a) 
additive input change /„(?); (b) additive output change f y {t) 


Example 5.6: Time-varying parameters of a first order process 
Given the linear first order process with constant parameters 
a i Y (t ) + Y (?) = boU(t) + b\U (?) 
Setting the derivatives to zero yield the steady state 


Y = b 0 U 


Now changes of the signals are introduced 

Y(t) = Y + A7(f); U(t) = U + A U(t) 


leading to 

a\ AY(t) + A Y(t) = b 0 A U(t) + biAU(t) 


or 

a\y(t) + y(t ) = b 0 u(t) + b\u(t) 
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where y(t) = A Y(t) and u(t) = AU(t). The influence of parameter changes on the 
output 

y{t) = -a\y{t) + b 0 u(t ) + b\u(t) 
is 


A y(t) = 
A y(t) = 
A >’(0 = 


9yQ) 

9y(0 

9y(0 

9 *! 


A«i = — j(?)Aai — a\Ay(t) 

Ab 0 = -ai-^-Abo + u(t)Ab 0 

9*o 

A*i = — a\ - A*i + w(f)A*i 

9*i 


Hence, step changes of A«i result in no change of the output if u(t) and y ( t ) are in 
steady state. A*o changes the gain and therefore a lasting deviation A y(t) occurs. 
A*i is a change of the lead time and does not result in an output change if u = 0, 
compare Figure 5.14. However, the situation changes if an input change u(t ) = 
AU(t) occurs. Then passing deviations are observed for parameter changes A a | and 
A*i as shown in Figure 5.15. 
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Fig. 5.14. Responses of a first order system to stepwise parameter changes for constant input 
U = U =const. 


This example shows that parameter changes of the gain can be detected from 
observing the output signal in steady state, but that the input signal must change to 
observe deviations of other parameters, like A«i and A*i. 

If the parameters change only slowly with regard to the process dynamics the 
above equations can be simplified. It holds, for example, 


A y(t) = —y(t)Aa\(t) + a\ 


9Aj(f) 

da i 


Aai(f) 
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Fig. 5.15. Responses of a first order system to stepwise parameter changes and input changes 
u(t) = AU(t) at the same time instant 


and for slowly changing Aa \(?) the term 3 Aj>(?) /da\ is small. Therefore the simpler 
equation 

Ay(?) = —j(?) Ar?i (?) 

can be used as an approximation and similarly 


Av(?) = u(t)Abo(t) 

A v(?) = u(t)Ab\(t) 

If the process model is written in vector form as (5.8) 

Oi 


and for slow parameter changes, 

A y(t) = i/r T (t)A0 


then small parameter changes 

A y(t) 


>’(?) = t T (t)0 

in the process lead to 

A a«, 

at>i 

this simplifies to 


□ 

(5.14) 


(5.15) 


(5.16) 


The differential equation (5.6) can be transformed into a state-space representa¬ 
tion by defining a state vector x(?) 


x(?) = Ax(?) + b u(t ) 


(5.17) 
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yit) = c T x(t) (5.18) 

see Figure 5.16. Additive faults of the state variable model are usually modelled as 
input or state faults f or output faults f m . Parameter faults A bj, A«,■ or A cj are 
multiplicative faults. Additive faults // and output faults /„, change the state space 
model according to 


x(f) = A \(t) + bn(t) + 1 fi{t) 

yit) = c T x(t) + m /„,(?) (5.19) 

and parametric faults with slow changes result in 

x(t) = (A + AA).x(t) + (b + Ab)t/(t) 

yit) = (c + Ac ) T x)t) (5.20) 

Similar representations hold for nonlinear differential equations and nonlinear state 



Fig. 5.16. Linearized dynamic process model with single-input single-output (SISO), additive 
state faults // and output faults and parametric faults Abj, A a,- and A cj 


space models, and for multi-input and multi-output processes, also in discrete time. 
On the basis of these basic static and dynamic process models and fault models, 
different methods of model-based fault detection can be developed. This is treated in 
the following chapters. 

b) Discrete-time dynamic process models 

If the process signals are sampled with sampling time Tq and discrete time k = 
t/To = 0,1,2 ,... the continuous-time differential equations can be discretized, if 
To is small, or the sampled data can be approximated by amplitude modulated S- 
functions for larger Tq. In both cases difference equations result 
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y(k) + a\yfk - 1 ) + ... + a m y(k - m) 

= bou(k) + b\u{k — 1) + ... + b m u(k — m) (5.21) 

which, by applying the z-transform (z = e r ° s ), leads to the z-transfer function 


r _ h o + h i z 1 + ... + b m z m 

p u(z) A(z ) 1 + a\z~ x + ... + a n z~ n 

G p (z) may also include a holding element. For more details see, e.g. [5.7], [5.1]. 

The relation between the parameters of G(z) and G(s) are only simple for orders 
n = 1 or 2. For higher orders they become complicated. 


Remarks on the modelling of faults 

The additive and multiplicative faults in Figures 5.11, 5.12 and 5.16 are just ide¬ 
alistic assumptions in which way real faults enter into the model. Example 5.1 has 
shown that sensor faults can be modelled as constant offsets, value-dependent offsets 
or direction-dependent offsets. Hence, the assumption of a constant additive fault f y 
or f m covers only one type of faults. Also the actuator example 5.2 indicates that 
constant additive faults fu are only a smaller subset of typical faults. Most of actua¬ 
tor faults are of multiplicative nature and therefore deviations in the actuator output 
depend on the size of the input, see (5.15) and (5.16). 

A consideration of Example 5.3 and 5.4 and of many other processes shows that 
most faults are multiplicative and are only in special cases additive faults. The faults 
are then not covered by the assumption of constant additive faults on the measurable 
signals. 

More details are given together with the applications examples in Part V. 


c) Relation between model parameters and physical parameters 

Generally, the process parameters 0 or process coefficients of continuous-time mod¬ 
els, depend on physical process parameters (or process coefficients) p (like stiffness, 
damping factor, resistance, capacitance) 

0 = /(p) (5.23) 

via nonlinear algebraic relationships, see, e.g. Example 5.5. If the inversion of this 
relationship 

P =r\0) (5.24) 

exists, the changes A pi of the process parameters based on model parameter esti¬ 
mates 6 respectively AO j can be calculated. This enables then to localize the faults 
better, if they influence only specific physical parameters pt, and therefore easies 
the fault diagnosis. However, certain conditions must be met for a unique inversion 
of (5.23), [5.18], [5.10], [5.12]. The corresponding identifiability condition for the 
physical process parameters is given in Chapter 23.4. It states that if the parameters 
estimates 0 are related with products of physical parameters z 
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e = Cz 


(5.25) 


where 


<7 

n c '> r /i 


the implicit functional theorem for 


q = 0 — Cz = 0 


requires that the functional determinant must satisfy 


3q r 

det Q„ = det —— ^ 0. 

dp 


(5.26) 


5.2.4 Nonlinear process models 

Many processes show a nonlinear static and dynamic behavior, especially if wide 
areas of operation are considered. In general, the nonlinear models follow directly 
from theoretical modelling by applying basic physical laws as balance equations, 
constitutive equations and phenomenological laws, see, e.g. [5.15], [5.22], [5.14], 
[5.21], [5.2], [5.4] and [5.11]. Based on these so-called first principles, the nonlinear 
structure of the models arise and it can be seen if the models can be used directly for 
developing the fault-detection methods described for linear systems. As a multitude 
of nonlinear models exist, it is not possible to describe them here. However, some 
typical models are summarized in the following, especially those that have shown to 
be good approximators for general nonlinear behavior or well suitable for process 
identification. They will be given in the form of discrete-time systems. 

a) Classical nonlinear dynamic models 

Classical approaches for the treatment of nonlinear dynamic systems are frequently 
based on polynomial approximators. One distinguishes between general approaches, 
e.g. Volterra-series or Kolmogorov-Gabor polynomials, and approaches that involve 
special structure assumptions such as Hammerstein, Wiener or nonlinear difference 
equation (NDE) models, [5.3], [5.5], see also [5.13], [5.13] and Section 9.3.2 

Another class are bilinear systems. They contain a multiplication of the input 
signal U(k) with the state variables x(k) in the form 

p 

x(k + 1) = A x(k) = B u(Ar) + ^ A,- Uj(k) x(k) (5.27) 

; = 1 

see, e.g. [5.17], [5.23]. An externally excited DC motor is an example for a bilin¬ 
ear system because the excitation current Ip is multiplied with the external state 
variables, speed m and armature current Ia, [5.6]. 
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b) Artificial neural networks 

For a general applicable modelling of nonlinear systems are those methods of inter¬ 
est that do not require specific knowledge on the process structure. Artificial neural 
networks fulfil this requirement and are described in Chapter 9. 


5.3 Problems 

1) What type of faults can be distinguished with regard to the behavior depending 
on time? 

2) What are the differences of additive and multiplicative faults? Take a sensor as 
an example. 

3) What kind of faults are typical for temperature, pressure and flow sensors? 

4) What type of faults can be modelled for contamination of a thermocouple im¬ 
plemented in the kernel of a high temperature fluid flow or in the wall of the 
pipe? 

5) What type of faults are typical for pneumatic diaphragm actuators? 

6 ) How can leaks in an oil pipeline be modelled by using mass balance equations 
or resistance laws? 

7) Consider an electrical circuit with parameters R, L and C. Which faults in these 
parameters can be detected based on dynamic or static models? 

8 ) Consider a second order mechanical system as in Example 5.5. Draw the signals 
for the input A F\ and the output Ar for 

a) additive stepwise faults of the input and output variable and constant input 
signal; 

b) parametric faults of c and d with and without stepwise input excitation. 

9) What are the advantages and disadvantages of continuous-time or discrete-time 
models for model-based fault-detection methods? 

10) Consider an electrical RC -circuit. Which if of the parameters R and C can 
be determined by parameter estimation with two measured signals Uj n (t) and 

tWO or 7(0? 
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Signal models 


Many processes are characterized by their oscillating or cyclic time behavior. This 
holds, for example, for rotating machines or alternating currents. The resulting sig¬ 
nals are then periodic signals or contain periodic parts. Random processes like 
acoustic noise, driving over road surfaces, turbulence flow, on-off switching of many 
consumers in electrical or water networks result in stochastic signals. Both signal 
types can be used for fault detection if changes in their models are caused by faults 
in the processes. Therefore, the generation of these signals and some basic signal 
models are considered. 


6.1 Harmonic oscillations 

For undamped periodic signals with cycle duration T p , the following expression is 
generally valid 

y(t) = y (t + Tp) (6.1) 


6.1.1 Single oscillations 

A harmonic steady state oscillation is described by a phase-shifted sine function 

y{t) = jo sin {In fat + (p) = jo sin (a) 0 t + <p) (6.2) 

with amplitude jo, frequency Jq = 1/ T p , angular frequency coq = 2 n /o and phase 
angle (p. A damped harmonic oscillation is denoted by 

y{t) = y 0 e~ St sin {co 0 t + cp ) (6.3) 

with the damping constant 8 . 

Examples of the formation of such oscillations with mechanical systems have 
been shown in [6.6]. In the following, the combination of different harmonic oscilla¬ 
tions and their models in the time domain are considered. 
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6.1.2 Superposition 

The simplest form of combined oscillations results from the superposition (addition) 

m 

y(t ) = ^2 yov e ~ 8vt sin + <Pv) (6.4) 

v= 1 

The superposition of two undamped oscillations with the angular frequencies 1 and 
2 yields the amplitude spectrum shown in Figure 6.1 

% 

y 02 

Fig. 6.1. Amplitude spectrum for the superposition of two oscillations 


6.1.3 Amplitude modulation 

An amplitude-modulated oscillation is obtained if the amplitude >oi of the carrier 
signal with angular frequency W\ is altered by a second oscillation, the modulation 
oscillation, with amplitude >02 and angular frequency co 2 . This results in the multi¬ 
plicative operation 

y(t) = Vi(f) v 2 (0 = Jot [>’02 sin (w 2 t + <p 2 )\ sin (uq? + (p x ) (6.5) 

Using the trigonometric relation 

sincr sinyS = - [cos (a — /l) — cos (a + /!)] (6.6) 

one obtains 

y(t) = ~>01 >02 [cos ((«! - (V 2 ) t + <Pi — <p 2 ) — COS ((ftq + 0) 2 ) t + (pi + <p 2 )} 

(6.7) 

Thus, two oscillation components of the same amplitude with the difference and sum 
frequency appear as shown in Figure 6.2 
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Fig. 6.2. Amplitude spectrum with amplitude modulation 

6.1.4 Frequency and phase modulation 

A frequency modulation of the carrier oscillation is obtained by 

y(t) = Jot sin [oh O 02 sin (w 2 t + (p 2 )) t + <p\\ (6.8) 

and a modulation of the phase angle by 

y{t) = yoi sin (w\t + y 02 sin {w 2 t + (p 2 ) + cpi) (6.9) 

These modulations are particularly used in communication engineering, since the 
useful information is contained in the frequency and phase of the carrier signal. In 
this case, disturbances of the amplitude yoi have practically no influence on the 
reconstruction of the useful signal in a receiver (demodulation). 

Example 6.1: 

Figure 6.3 shows the results of the amplitude, phase and frequency modulation for a 
signal composed of two undamped partial oscillations. 


6.1.5 Beating (Libration) 

Now, the superposition of two oscillations, with angular frequencies a>j and a > 2 with 
a minor difference of Aw = w 2 — w\ but the same amplitudes, is considered 

y\(t) = y 0 sin (w\t + (pi) 

J2(0 = Vo sin [(ct>i + A w)t + (p 2 \ 

Using the trigonometric relation 

sin a + sin p = 2 sin I —-— I • cos 
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y WI v I amplitude modulation 



(VI -1.5V-■-■-. 

t a l o i 2 : 

time t [s] 


y M1 V 1 frequency modulation 



„ . -Z.3V 1 -■-■-■-■-' 
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y M [V] phase modulation 



Fig. 6.3. Time course of an oscillation with: (a) amplitude modulation y(t) = sin (2 tt • 10 Hz ■ 
r) • sin (2 jt • 1 Hz-t + tt/ 3) (b) frequency modulation y(t) = 2 sin [(2?r • lOHz-r -0.5)-sin (2n ■ 
IHz-t + jr/3)] (c) phase modulation y(t) = 0.5 sin (2n■ \Hz-t + 2-sin (2jr- \0Hz-t) + n /3) 















6.1 Harmonic oscillations 


87 


one then obtains 


with 


y(t) = y\{t ) + yi(t) 

= Jo (Osin 


A co 

a>i H —— ) t + <p 


= jo (t ) sin 


CD i + U>2 


t + tp 


( 6 . 10 ) 


jo(0 = 2 cos 


1 


" Amf — tp\ + tp 2 ~ 

= 2 cos 

C0 2 — CD 1 

(P 1 + <f2 

2 

2 

2 


V = ^(<P l + tp 2 ) 


(6.11) 


It yields a sinusoidal oscillation with an averaged frequency (cd\ + to 2 )/2 and an am¬ 
plitude jo (t) that changes co-sinusoidally with the half difference frequency Acd/ 2, 
a so-called beating. A superposition of oscillations with adjacent frequencies leads to 
an amplitude-modulated signal, whose carrier signal has the frequency (co 1 + co 2 )/2 
and whose modulation signal is the frequency (co 2 — o>\)/2. 


Example 6.2: 

The superposition of the oscillations 

ji(t) = sin (2^-1 Hz) t 
j 2 (t) = sin (2n- 1.01 Hz) t 

yields a beating with a frequency of A//2 = 0.005 Hz, as shown in Figure 6.4 


y{t) beating 



time [,s] 

Fig. 6.4. Time course of a beating 


□ 
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6.1.6 Superposition and nonlinear characteristics 

Now, the case of a signal y(t ) composed of two superimposed oscillations is consid¬ 
ered 

y(t) = Ti(0 + y 2 (t) 

= Tot sin (a>\t + (p^ 

yi(t) = V 02 sin (co 2 t + (fi 2 ) (6.12) 

This signal is fed into a succeeding nonlinear characteristic curve 

z(y) = v 2 (6.13) 

Then, 

z(t) = Tot sin2 ( w t ? + <Pi) + 2jot J 02 sin (a> x t + (p\) sin (a> 2 t + cp 2 ) 

+ To 2 sin2 ("2 1 + (p 2 ) 

= Voi sin 2 (W\t + (pi) + >’02 sin 2 ( co 2 t + cp 2 ) + 2v 0 1 [>’02 sin(cn 2 f + <P 2 )\ 
sin (ci>i? + q>\) (6.14) 

applies to the resulting signal. Thus, squared-sinusoidal oscillations for each fun¬ 
damental frequency and an amplitude-modulated sinusoidal oscillation emerge. A 
further transformation by means of the trigonometric function 


2 1 

sin a = -[1 — cos2 a] 


yields 


z(t) = 2(To 1 + To2 ) 

1 2 1 2 

- - v 01 cos {2a>\t + <pi) - -y 02 cos (2 a) 2 t + (p 2 ) 

+ >’oi To 2 cos [(cui - w 2 )t + (pi- (p 2 ] 

~ VOI >’02 COS [(«! + C0 2 )t + (pi + <fi 2 \ 


(6.15) 


As two oscillations with the angular frequencies 1 and 2 pass through a squared 
nonlinear characteristic curve, the angular frequencies become 

2o>i, 2a> 2 , a>i — co 2 ,coi + co 2 

and an additional offset occurs, as shown in Figure 6.5. Thus, nonlinear transfer 
elements lead to oscillations with new frequencies at the output. 


6.2 Stochastic signals 

The treatment of stochastic signals depends on the consideration in continuous time 
or discrete time. Therefore, both cases are briefly discussed. 
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Fig. 6.5. Effect of a square characteristic curve on the amplitude spectrum of two superimposed 
oscillations 


6.2.1 Continuous-time stochastic signals 

The time history of stochastic signals is random and can therefore not be described 
precisely. With the aid of statistical methods, probability calculus and averaging 
some properties of these signals can be stated. Measurable stochastic signals are 
not completely random but show internal relations which can be expressed by math¬ 
ematical signal models. 

Because of the accidental character one signal source provides a family (ensem¬ 
ble) of random functions 

{xi{t),x 2 (t),.. .,x n (t)} 

This ensemble of signals represents a stochastic process. One random function is 
called a sample function. 

This statistical treatment of stochastic signals uses probability density functions 
and results in simplified equations for stationary signals if the probability density 
functions become independent on time. Applying the ergodic hypothesis means that 
the same statistical information as for ensemble averaging can be obtained by av¬ 
eraging of one single sample function x(t) for infinite long time intervals. Then the 
following averaged characteristic values of a stationary stochastic signal can be given 

• average value 

1 f T 

x = E {x(t)\ = lim — / x(t)dt (6.16) 

T->oo T J 0 

• quadratic average value (variance) 

ol = E l(x(t) — A') 2 } = lim — f (x(t) — x ) 2 dt (6.17) 
1 I T-+o o T Jo 
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• auto-correlation function (ACF) 

1 f T 

= E {x(t) — x(t + r)} = lim — / x(t)x(t + z)dt (6.18) 
T-+o o T Jo 

• auto-covariance function 

R xx {r) = cov [x, r] = E {(x(t) - x)(x(t + r) - x)} 

= E {x(t)x(t + r)} — x 2 (6.19) 

The auto-correlation function or auto-covariance function express the internal simi¬ 
larity of the random signal. For more detail see [6.7], [6.2], [6.4], Some example are 
shown in Figure 6.6. 

The statistic relation between two different stochastic signals x(t) and y(t ) is 
expressed by the 

• cross-correlation function (CCF) 

i r T 

4>xy(t) = E {x(t)y(t + r)} = lim — / x(t)y(t + r )dt 

T^-o o 1 Jo 

1 f T 

= lim — / x(t — r)y(t)dt (6.20) 

T^oo T Jo 

Through Fourier transformation of the auto-covariance function, one obtains in 
the frequency domain the 

• power density 


S X x (i to) = T{R XX (r)} 


'f 


/ oo 

-oo 


Rxx(r)e 10>z d r 


OO 

icoz, 


■-L 


RxxO) e lmT d r = 2 / R xx (t) coscordr (6.21) 


A special stochastic signal is the white noise. It is totally statistically independent, 
resulting in the covariance function 


RxxO) = S 0 8( x) 


( 6 . 22 ) 


This signal has constant power density So = const, for all frequencies and is not 
realizable. Its variance is infinite. 

The auto-correlation function can also be applied for periodic signals and be¬ 
comes then also periodic. Hence, it distinguishes significantly from stochastic sig¬ 
nals. Therefore, correlation functions are very well suited for the separation of peri¬ 
odic and stochastic signals, [6.5]. 
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Fig. 6.6. Models of different stochastic and periodic signals: (a) white noise; (b) high-frequent 
noise; (c) low-frequent noise; (d) harmonic oscillation; (e) harmonic and stochastic noise; (f) 
DC value signal 


6.2.2 Discrete-time stochastic signals 

Discrete-time stochastic signals usually result from sampling of continuous time sto¬ 
chastic signals with sampling time 7 0 . Then the discrete time k = t/ 7 0 = 0. 1,2,... 
is introduced. The signal model values follow directly from the previous once by 
replacing the integral through a sum. 

• average value 

1 N 

x = E{x(k)}= lim —Y^ x(k) (6.23) 

N-+oo N ZJ 
k=\ 

• quadratic average value (variance) 

1 N 

a\ = E \x(k) — „v 2 { = lim — Y^ [x(Ar) — A'] 2 (6.24) 

‘ I N-+oo N z —' 


auto-correlation function (ACF) 


<Pxx( r ) = E {x(k)x(k + t)} = lim — Y^ x(k)x(k + r) 

N-+oo N —' 
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• auto-covariance function 


R xx (r) = cov [x, r] = E j x(k)x(k + r) — a 2 J 


cross-correlation function (CCF) 


1 N 

<t>xy{r) = E {x{k)y(k + r)} = lim — Y x(k)y(k + r) 

N-+oo ' 


power density 


S* xx {ico) = E {R xx {r)} = Y 


f —i(oTQT 


The discrete-time white noise is described by 


R xx (r) = cov [a, r] = o x S( r) (6.29) 

where 8(r) is the Kronecker delta function 

= o <630) 

Contrast to the continuous-time white noise, the discrete-time white noise is realiz¬ 
able and possesses a finite variance. 

A further model representation of stochastic signals is possible by stochastic dif¬ 
ference equations. They result by filtering of discrete-time white noise v(k) by the 
difference equation 

}’(k) + ay(k - 1) + ... + c n y(k - 1) 

= dov(k) + d\v(k — 1) + ... + d m v(k — m) (6.31) 

The signal y(k) is herewith the output of a fictitious filter with the r-transfer function 


G f (z) = 


y(z) _ d 0 + d\z 1 + ... + d m z m _ D(z) 
v(z) 1 + ClZ -1 + ... + c„z~ n C(z) 


and as input signal the white noise v(r) with zero mean and variance cr 2 = 1. Spe¬ 
cializations are the autoregressive process (AR) with 

G f (z) = (6.33) 


and the moving average process (MA) with 


G f (z) = D(z) 


Therefore, (6.32) represents an ARMA process. For more details, see [6.3], [6.1], 
[6.5], 
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6.3 Problems 

1) An acoustic harmonic signal with f\ = 1000 Hz is amplitude-modulated with / 2 
= 50 Hz. Determine the frequencies of the resulting oscillations. 

2) The amplitude joi = 1 of a harmonic signal with frequency J\ = 100 Hz is mod¬ 
ulated with fi = 20 Hz and amplitude V 02 = 0-5. Determine the resulting fre¬ 
quencies and show them in a diagram for the amplitude spectrum. 

3) The two engines of an aircraft run at 2500 and 2510 rpm. It is assumed that 
the six-cylinder four-stroke engines generate a noise with the ignition frequency. 
Which frequencies will be heard? 

4) The electromagnetic force on the armature in a solenoid is proportional to the 
square of the magnetomotance and the current respectively. Which frequencies 
arise in the magnetic force for an alternative current of 50 Hz or 60 Hz? 

5) State the differences between auto-correlation functions and auto-covariance 
functions. 

6) Draw the autocorrelation for white noise in the case of continuous-time and 
discrete-time signals. 

7) Which sensors can be used at the crankshaft-casing walls of a gasoline engine 
to detect knocking? The resulting oscillations are within 20 ... 30 kHz. Which 
methods can be used for knock detection? Write corresponding equations. 
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Fault detection with limit checking 


The most simple and frequently used method for fault detection is the limit checking 
of a directly measured variable Y(t). Herewith, the measured variables of a process 
are monitored and checked if their absolute values or trends exceed certain thresh¬ 
olds. A further possibility is to check their plausibility. 


7.1 Limit checking of absolute values 

Generally, two limit values, called thresholds, are preset, a maximal value Y max and 
a minimal value Y,„i„. A normal state is when 

Y m in < - Y(t) < Y max (7.1) 

which means that the process is in normal situation if the monitored variable stays 
within a certain tolerance zone. Exceeding of one of the thresholds then indicates a 
fault somewhere in the process, compare Figure 7.1. This simple method is applied in 
almost all process automation systems. Examples are the oil pressure (lower limit) or 
the coolant water (higher limit) of combustion engines, the pressure of the circulation 
fluid in refrigerators (lower limit) or the control error of a control loop, the thresholds 
are mostly selected based on experience and represent a compromise. 

On one side false alarms through normal fluctuations of the variable should be 
avoided, on the other side faulty deviations should be detected early. Therefore a 
trade-off between too narrow and too wide thresholds exists. 


7.2 Trend checking 

A further simple possibility is to calculate the first derivative Y = dY(t)/dt, the 
trend of the monitored variable and to check if 

Ymin < Y(t) < Y max {1.2) 
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If relatively small thresholds are selected, an alarm can be obtained earlier than for 
limit checking of the absolute value, see Figure 7.1b. Trend checking is, for exam¬ 
ple, applied for oil pressures and vibrations of oil bearings of turbines or for wear 
measures of machines. 

Limit checking of absolute values and trends can also be combined. This requires, 
however, a mutual coordination of the thresholds. One possibility is to make the 
threshold of the absolute values dependent on the trend, e.g. Y max = ./(Y > 0) and 
Y m j n = f(Y< 0) in order to detect fast developing deviations early and to avoid 
false alarms for small trends, if the value is far away from the threshold possibility 
to be exceeded, compare Figure 7.2, [7.17], In some applications it is advantageous 
to make the thresholds adaptive, i.e. a function of other variables. For example, the 
thresholds for residuals are set in dependence of the input excitation of the process, 
see Section 7.5. 

A further improvement of limit checking can be realized by applying signal pre¬ 
diction. By using polynomial regression models 

Y{k) = ciq a\k ci2k~ . ( 7 - 3 ) 

k = 11 To discrete time, and recursive least squares parameter estimation, a predic¬ 
tion of the signals can be made for N samples ahead. This is then possibly a better 
prediction than the trend only. The predicted signal then shows relatively early the 
danger of exceeding a threshold or avoids a false alarm if the signal returns to the 
normal zone without action. The prediction can also be made by using stochastic 
difference equations for the randomly changing monitored signal, [7.17], [7.19]. 



7.3 Change detection with binary thresholds 


97 



Fig. 7.2. Combination of limit checking for absolute values and trends. The thresholds Y max 
and Y m i n are a function of the trend Y 

7.3 Change detection with binary thresholds 

7.3.1 Estimation of mean and variance 

The monitored variables are usually stochastic variables T/(i) with a certain proba¬ 
bility density function p ( Yj ), mean value and variance 

fi t = E{Y,(t)}; df = E |[1/(0 — /r,] 2 | (7.4) 

as nominal values for the non-faulty process. Changes are then expressed by 

A Y i = E{Yi (0 - in) and Act 2 = E {[o*(?) - ct,] 2 J (7.5) 

for t > tp, where tp is the time of fault occurrence, which is unknown. 

If the mean and standard deviations before the change caused by a fault are de¬ 
scribed by p o and cto and after the change has appeared by p ,1 and a \, the change 
detection problem is depicted by Figure 7.3, assuming a normal probability distribu¬ 
tion of the variable Y(t). Then the following cases of changes can be distinguished: 

1) the mean changes p \ = po + A/r; standard deviation <j\ = no remains constant; 

2) the mean does not change p\ = // 0 ; standard deviation changes <J\ = <r 0 + Act; 

3) both, mean and standard deviation change. 

As an example now case (1) is considered. If the probability densities do not 
significantly overlap, one can use a fixed threshold. 

A Y to i — k ct 0 (7.6) 

with, e.g. k > 2, to detect the change just by observing the average // (K, t). In select¬ 
ing the threshold, a comparison has to be made between the detection of relatively 
small changes and false alarms. 
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Fig. 7.3. Normal probability density functions of the observed variable Y for the nominal state 
(index 0) and changed (faulty) state (index 1) 


However, the detection problem becomes more involved if the change of the 
mean 

A n = Hi~Ho (7.7) 

is small compared to the standard deviation, say k < 1. Then statistical tests have to 
be applied. 

The detection of changes of the random variable Y(k) can be performed off-line 
or on-line in real-time. For off-line change detection within a sample length N it has 
to be tested where at some unknown time tp a change in Y(k) occurred from To to 
Y\. This can only be made after storing all data. For fault detection in real-time the 
on-line change detection is of more interest. Here at every time k it has to be decided 
if a change from To to Y\ has happened. This means that especially sequential or 
recursive tests are of interest for fault detection. The first case is easier to decide, 
because more measurements are available. 

a) Statistics of the observation 

The decision on a change of the observed variable can now be brought into a sta¬ 
tistical context. This is of interest if small changes have to be detected in a noisy 
environment. It is assumed that the observed variable Y(t) is a scalar function of 
time and is sampled with discrete time t = kT( } . where Tq is the sampling time. It is 
further assumed that Y(k) is a random variable with probability distribution p(Y), 
for example, a Gaussian or normal distribution, see Figure 7.3, 

j (y-u y ) 2 

P(Y ) = e 2o T (7.8) 

"V 27T (J y 


with the mean (first moment) 


HY = E{Y(k)} 


(7.9) 


and variance (second or central moment) 
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of = E 


{ (Y(Jc)-iiy) 2 

Then the probability that Y(k ) lies within certain areas is 

P(li Y - o < Y < ily + a) = 68.3% ) 

P(fi Y - 2ar < Y < hy + 2a) = 95.4% J 

The considered variable can also be described by 

Y(k) = hy + n(k) 

where n(k) is a zero mean random variable (noise) with a„ = oy- 


(7.10) 


(7.11) 


(7.12) 


b) Estimation of mean and variance 

For the estimation of the mean and variance it can usually be assumed that the signal 
is ergodic. Therefore the estimates can be formulated for one sample function Y(k) 
dependent on the time 

1 N 

n/c) (7.13) 


k= 1 


o 2 y (N) 


N 


1 N 

— J2 (Y(k)-^Y) 2 


(7.14) 


k= 1 


For on-line application recursive forms are of interest which are obtained by sub¬ 
tracting, e.g. £ly(N) from (ly(N + 1), resulting in 


jlY(k) = {lYfk - 1 ) + y [Y(k) - jx Y {k - 1 )] (1 < k < N) 

k 

°y(k) = a\fk - 1) + \ [[(T(/r) - fi Y (k - l)] 2 - ^ a£(k - 1) 
= |Ef o\{k - 1) + \ [Y(k) - jX Y (k - l)] 2 (2 <k< N) 


(7.15) 


(7.16) 


A further possibility is to limit the averaging over a time window of length w. The 
mean then becomes 

N 


Py(N) 


1 


w 


£ nk) 


and in recursive form 


k=N-w+\ 


1 


jX Y (N) = p. Y (k - 1) + — [Y(k) - Y(k - w)] 
w 

Correspondingly the window estimate of the variance yields 

N 


a 2 (TV) = 


1 


w — 1 


£ (X(k) -Mr) 2 


(7.17) 


(7.18) 


(7.19) 


k=N-w+l 



100 7 Fault detection with limit checking 


and recursively 

Vy(k) = o\{k --1) + ^ [& 2 (k) - a 2 (k - w ) 
-2y(k)^ fX Y (k-\)-^y 2 (k) 

Y(k) = (i(k) - fi Y (k - w) 


(7.20) 


Still another way is averaging with exponential forgetting and forgetting factor X < 
1, (e.g. X = 0.95) 


(irik) = X jX Y (k - 1) + (1 - X)Y(k) (7.21) 

which corresponds to a frozen, i.e. constant k = k\ = 1(1—A) in (7.15), see [7.18], 
This leads then to 

2 \ - i 

Vy(k) = —— o\(k - 1) + (1 - X) [Y(k) - fi Y {k - l)] 2 (7.22) 

(7.21) corresponds to a discrete-time low pass filter of first order for determining 

E{Y(k)} 

fX Y (k) = li Y f(k) = -fli n Yf (k - 1) + b 0 Y(k) 
withai = —X = e~ T °/ T ; bo = (1 +«i) 


This leads to a signal flow as in Figure 7.4. Then the variance can also be determined 
via 


by(k) = E {[: Y(k) - ii Y {k)] 2 } 

= E {Y 2 {k)} - E {/4(A)} 


(7.24) 


where E {Y 2 (k)} is also determined by a first order low pass filter 


Yj(k) = -a x Yj(k - 1) + bo Y 2 (k) (7.25) 


see [7.15], [7.8]. 

Also higher order low pass filters may be used, however, on cost of detection 
time. 

If different variables Yi(k) have to be compared the variation coefficient can be 
applied, [7.6] 


vt(N) 


im(N) 


(7.26) 


7.3.2 Statistical tests for change detection 
a) Hypothesis testing 

For change detection hypothesis tests can be applied known from the theory of statis¬ 
tics, [7.23], [7.24], [7.4], [7.26], [7.12]. In hypothesis testing one tests a hypothesis 
Ho against one or more alternate hypothesis H \, H 2 ,... that are specified. For fault 
detection mainly of interest is the 
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Fig. 7.4. Determination of mean and variance by low pass filters (LPF) 


• null hypothesis (no fault) 




Ho : Y € Y 0 ; 

(7.27) 

• change hypothesis (fault) 




H x :YeY, 

(7.28) 


where Kq is the nominal value of the considered variable, which is assumed to be 
random. This means a decision has to be made as follows. Based on the assumption 
that the null hypothesis is true if no fault occurs, the null hypothesis is rejected and 
the alternate hypothesis H\ is accepted, if the sample of the random variable Y falls 
outside the region of acceptance. Otherwise, Ho is accepted and II\ is rejected. 

If the probability density function p(Y) is known, the tasks of hypothesis testing 
is illustrated in Figure 7.5. Assuming that the true hypothesis Ho is Y = /to, the 
question is how much must the estimate /to differ from /to for rejecting the hypoth¬ 
esis. If the /to can be any value for acceptance, the probability is 

/ OO 

p(Y) dY = 1 (7.29) 

-OO 

If the region of acceptance is limited to \i > the probability of rejection is 

P(H o < M|) = / P(Y) dY = — (7.30) 

and if acceptance is limited to /t < /t[_« the probability of rejection is 


r°° . a 

Pino > Mi-f) = / P(Y) dY = — (7.31) 

l-f 2 

because of symmetry. Herewith a is called the level of significance, which is usually 
a small number a = 0.05 or 0.01. Thus, the range of acceptance of the hypothesis is 

• null hypothesis Ho : /t« < /to < /ti_“ 
and the range of rejection 

• change hypothesis II\ : /to < /t« and /to > ji\ " 
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Fig. 7.5. Regions of rejection and acceptance for a symmetric hypothesis test of a random 
variable Y 


The test described above is a two-sided test. 

For hypothesis testing many different test methods were developed. Only some 
of them will be considered here. A first class assumes that the probability distribution 
of the random variable is normal and is called parametric tests. A second class does 
not make assumptions on the probability distribution and is called distribution-free 
or nonparametric tests, [7.12], In the following mainly parametric tests will be con¬ 
sidered, like f-test, likelihood and ratio tests. As a nonparametric test, the /-’-test is 
described. 


b) Test quantities 


The theory of statistical hypothesis testing distinguishes special test quantities, which 
themselves possess certain probability distributions. These are mainly only known 
for normal distribution of the investigated variables. Usually the mean /x and 

the standard deviation a have to be tested. For testing of the mean the test quantity 

_ jfJ2k=i Y(k)-n r 

d/VN °y{N) ( • 

is used, [7.14], [7.26], which is t-distributed with degree of freedom f = N — 1 (N 
is the number of variables). 

Testing of the variance, the test quantity 


(N - 1 W{N) 


1 N 

-r E <w 


' Mr)' 


(7.33) 


Y k = 1 


is applied, which is x 1 distributed, because ^ A Y 2 (k) shows a / 2 distribution 
with N degrees of freedom. If / 2 > x_% _i , „ the null hypothesis is rejected, i.e. 
the variance has increased with probability (1 — a). 
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Example 7.1: Testing of the mean 


If a change of A fi = £l(N) — jio has to be detected for a variable Y(k) with standard 
deviation a = 1 then t > 1.96 for large N and the significance level a = 0.05 
(probability 95%) has to be reached, which follows from a t -distribution table. The 
number of required samples N then yields from (7.32) 


N > 


O t 0 o, 0.05 

A fi 


= (i ) 2 384 


One obtains then for A/x = 0.1 N > 384, for A/x = 0.2 N > 96, for A/x, = 
0.3 A > 43 samples to detect the change with 95% probability. 

□ 


Example 7.2: Testing of the variance 

The task consists of testing if the variance cr 2 {N) compared to the previous value 
<7g = 1 has increased to 1.2. The y 2 distribution table shows for the significance 
level a = 0.05 (95 % probability) and N = 5, 10, 30 

X 2 (5) = 11.1; r(10) = 18.3; x 2 (30) = 43.8 


(7.33) yields 

X 2 (5) = llipl! = 5 . 76 ; X 2 (10) = 12.96; X 2 (30) = 41.8 

For N = 30 is holds y 2 > y 2 () 0 05 . This means about 30 samples are required to 
detect if the standard deviation a\ has increased by about 10 %. 

□ 


c) Detection of changes in the mean: r-test 


The variable Y(k) is assumed to have the mean Yq before the change and is observed 
with No samples. After the change the mean is }'| and N\ samples are available. 

A classical test for the comparison of two mean values is the f-test. Here it is 
assumed that the probability distribution is normal and that the variances before and 
after the change are not known but equal. The test quantity is then, [7.4], 


Mr o — Mr l 


INoNiiNo + Ni -2) 


yj( N o ~ l)o-y 0 + (Ni - \)a 2 x 


No + Ni 


(7.34) 


If |f | > td'No+Ni- 2 then a change has occurred, where f a ,Ar 0 +jv t -2 is taken from a 
/-distribution table for the significance level a. 
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If the number of new measurements N i is assumed to be small compared to 
previous measurements No, i.e. N 1 <3C Nq then 


Aro — An 
Oq/ *JN\ 


which is identical with the test quantity (7.32). 


(7.35) 


d) Run-Sum tests for detection of changes in the mean 

Especially in the area of quality control, relatively simple run tests were developed 
also known as control charts, see, e.g. [7.22], [7.13]. They test the change of the 
mean, provided the mean //o and standard deviation ctq before the change are known 
and the standard deviation does not change, G\ = oo. A simple one-sided run test 
is, for example, if one value is larger than 3 times the standard deviation or if n = 1 
values successively are above the mean, [7.9]. The run sum tests make use of the 
cumulative sum (CUSUM) of a random variable. Examples are: 

N 

RS i (N ) = ( Y ( k ) — //o) deviation from a reference (7.36) 

k= 1 

N 

RS 2 (N) = ( Y(k ) — Y(k — 1)) successive differences (7.37) 

k= 1 

One of the run sum tests classifies the measured variable or estimate of the mean 
into deviation-bands as multiples v of the standard deviation, Table 7.1. If the ob¬ 
served variable, e.g. enters the band fio + 2 <j < Y < /iq + 2a the score scj = 2 is 
assigned. The run sum is then the sum of the scores 

N 

RS 3 (N)=J2 sc j( k ) (7-38) 

k= 1 

If a threshold RS t /, is exceeded, a change is detected. A run is terminated when a 
value of Y falls in the opposite side of the mean value, i.e, the sign of the deviation 
from the mean changes. 


e) Detection of changes in the variance: F-test 

A further classical test for the comparison of two variances is the /-’-test. It is assumed 
that both samples are normally distributed. The mean values must not be known, they 
can even be different. The test quantity is 


F(N l ,N 2 ) 


gyiCjVl) 

Oy 2 ( N 2) 


(7.39) 
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Table 7.1. Assigned scores for a two-sided run-sum test 


Deviation bands Y 

Score scj 

> Po + 3 0 

+ 3 

> po + 2a 

+ 2 

> p 0 + Iff 

+ 1 

> p 0 + On 

+ 0 

< Pq — Off 

-0 

< Po — lo 

- 1 

< Po — 2a 

-2 

< po — 3o 

-3 


As F follows a ^-distribution with (mi, m 2 ) degrees of freedom (m\ = N\ — 
1; m 2 = N 2 — 1), F(m \, m 2 ) is taken from a /-’-distribution table for a significance 
level a. If 

F(Ni,Nf) > F(m\,m2,a) (7.40) 

then > Oy-, with probability (1 — a). 

The statistical tests discussed so far are applicable if as well the measurements 
No before the change and /V 1 after the change are relatively large. However, large 
Ni contradicts the early detection of changes. Therefore the classical statistical tests 
are in general not directly recommended for fast change detection in real-time. 


f) Likelihood ratio test for jump detection 


It is assumed that Y(k) is statistically independent and that the type of probability 
distribution density py(Y) is known, e.g. normal distribution and that the parameters 
of this distribution density, e.g. p.y has to be estimated, py(p, a) is now called a 
likelihood function, because it is a probability function in which the observations are 
regarded as fixed numbers and the parameters as the variables. 

To test if the observed variable Y(k) is more likely to be To or Y\ one can assume 
two probability densities py(Y\Ho) and py(Y\II \), compare Figure 7.3. Then one 
may determine the likelihood-ratio 


A (X) = 


Py(Y\Hi) 

Py(Y\Ho) 


(7.41) 


in order to compare both probability densities. 

If A(T) > A t i, H\ is decided, if A(T) < A,/, H 0 is true. As the logarithm is a 
monotonic function also the log-likelihood ratio can be applied 


In A(T) = In p Y i(Y) - In p Y o(Y) (7.42) 


Then computations become simpler and the decision for ll\ is just 
In A(T) > In A r /, or In A(T) — lnA f i, > 0 


(7.43) 
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i.e. a sign change determines of H\ or Ho is true. This is the sequential probability 
ratio test (SPRT), according to [7.25], [7.10], This test can now be further evaluated 
for different assumptions about the probability density function, [7.11], 

If the probability density is normal, the means Mo and p. \ are known and it is 
assumed that the variance does not change, i.e. u 2 = rr~ = a 2 , then (7.42) leads to 


In A (Y) 


(Y(k)-fi i) 2 (Y(k)-po) 2 Mi-Mo 


2cr 2 


+ 


2a 2 


a^ 


Y(k) 


Mo + Mi 


(7.44) 


[7.1]. If the observations Y(k) are statistically independent, the likelihood-ratio be¬ 
comes after jump at time k' 


A(r)l£ 


yr Pr(Y(k)\H i) 
Li, PY(Y(k)\H 0 ) 


(7.45) 


k=k' 


Assumption of normal distribution leads with the known jump size Am = Mi — Mo 
to 

In A(r)|£ = ZLk’ [Y{k) - MO - ¥] 

1 olV 


S£, (m, Am) 


(7.46) 


Therefore the test quantity 

s, w (m. Am) = Am 


N r 


k=i L 


Y(k) — Mo ~ ~!y 


(7.47) 


can be computed for a time window of length (N — i). With s = S/ Ap a recursive 
form becomes 


Si(k) = Si(k - 1) + Y(k) - mo - ¥ 

Si(k) = Si (k - 1) + Y(k) - ""/> 
This change detection algorithm becomes 


(7.48) 


Si(k) = .y ; - (/c - 1) = for ^(^) = Mo 

.y,-(W) = (k — 1) = ¥ f° r T(^) = Mi 

The incremental quantity 

A si (k) = si(k) — Sj(k — 1) (7.49) 

changes its sign after a jump has arisen in Y(k). The described detection method is 
also known as Page-Hinkley stopping rule, [7.21] or cumulative sum algorithm, see 
also [7.3]. Note, that different cumulative sum algorithms exist, see Section d). 

The assumption of a known jump size and direction is rather unrealistic. For 
unknown jump magnitude one can run two tests in parallel, assuming that Mo is 
known: a minimal jump magnitude can be assumed with two signs, i.e. + Am and 
-Am, [7.2], 
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The statistical tests described in this section have shown that generally relative 
limiting assumptions had to be made, which are mostly not satisfied by experimen¬ 
tally generated residuals. In many cases these residuals are not statistical indepen¬ 
dent, not normal distributed, non-stationary and with unknown changes of the means 
and variance. Therefore one should first try the relatively simple tests from Section 
7.3.1 and adaptive thresholds, (see Section 7.5) and if possible, to obtain strong de¬ 
flections of the features by proper design of the fault detection method. 


7.4 Change detection with fuzzy thresholds 

In many cases the binary decision between “normal” and “faulty” is somewhat ar¬ 
tificial, because there is seldom a sharp boarder between both states. Figure 7.6a. 
Therefore fuzzy thresholds are a more realistic alternative for change detection. The 
fluctuating variable Yj(t) can also be described by a fuzzy set /z„(F) for the normal 
state, see Figure 7.6b. (Here the usual variable // is used for a membership function 
which should not be confounded with the mean value fi of the previous sections). If 
the fuzzy set is selected as a triangle, the center describes the mean and the lower 
width is A = Kty l , k = 2,3,.... A membership function // y+ for “increased” is 
selected, to obtain a gradual measure for exceeding a threshold, [7.20]. Matching the 
changed fuzzy set /F(F) with the fuzzy threshold fiy+ leads to the exceeding degree 

fjL Y = maxy [min (ju,'(Y), /zr+(F))] (7.50) 

As result then, e.g. // y = 0.6 is obtained which means that the threshold is reached 
to 60 %. 

Depending on the selection of the thresholds membership functions, this gradual 
information gives more information on the severity of an unnormal state than binary 
thresholds. 


7.5 Adaptive thresholds 

Process model-based fault-detection methods described in the next chapters use 
process models which do not fully agree with real processes due to model uncer¬ 
tainties. Then, the generated residuals deviate from zero even without faults. These 
deviations depend then frequently on the amplitude and frequencies of the input ex¬ 
citation. Therefore the residuals may contain a static part which is proportional to the 
input U(t) and a dynamic part dependent, e.g. on U(t). To cope with this problem, 
[7.16] has introduced an adaptive threshold which uses a first order high pass filter 
(HPF) for enlarging the threshold. Figure 7.7. A proportional enlargement may be 
added by a constant C 2 , [7.8]. A low pass filter (LPF) is used to smooth the thresh¬ 
olds. The time constants T\ and are selected according to the dominating time 
constant of the process. T^/ T\ depends on the model uncertainty of the dynamics. 

Figure 7.8 shows an example for the time behavior of an adaptive threshold. 
Adaptive thresholds were also proposed by [7.5], [7.7], 
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Fig. 7.6. Change detection for stochastic variables: (a) stochastic variables Y(t) with probability 
density function p(Y). p„(Y) is the normal state. p'(Y) is the changed state. A Y to / marks 
the (binary) threshold; (b) stochastic variable Y(t) as fuzzy set fi{Y). p n (Y ) is the normal 
state. fi'(Y) is the changed state. py + is the fuzzy threshold for “increased" and py = 0-6 
is degree of exceeding the threshold “increased” 



Fig. 7.7. Generation of an adaptive threshold dependent on process input excitation. The con¬ 
stant threshold is th const = c\ 

7.6 Plausibility checks 

A rough supervision of measured variables is sometimes performed by checking the 
plausibility of its indicated values. This means that the measurements are evaluated 
with regard to credible, convincing values and their compatibility among each other. 
Therefore, a single measurement is examined whether the sign is correct and the 
value is within certain limits. This is also a limit checking, however, with usually 
wide tolerances. If several measurements are available for the same process then the 
measurements can be related to each other with regard to their normal ranges by 
using logic rules, like 
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IF \X\min < Y\(t) < Y lmax \ THEN [Y 2m i n < Y 2 {t) ^ Yimax ] (7.51) 

For example, one expects for a circulation pump with rotating speed n and pressure 
P 

IF [1000 rpm < n < 3000 rpm] THEN [3 bar < p < 8 bar] 

The plausibility check can also be made dependent on the operating condition, like 
IF [Operating condition 1] THEN [T 3m( „ < Y 2 (t) < Y 2max \ (7.52) 

One example is the oil pressure p ol 7 of a combustion engine with speed n and cooling 
water temperature #j -[20 

IF [n < 1500rpm] AND [% 2 0 < 50 ° c ] THEN [3 bar < Poll < 5bar] (7.53) 

Hence, plausibility checks may be formulated by using rules with binary logic con¬ 
nections like AND, OR. These rules and ranges of the measurements allow a rough 
description of the expected behavior of the process under normal conditions. If these 
rules are not satisfied either the process or the measurements are faulty. Then, one 
needs further testing to localize the fault and its cause. 

These plausibility checks presuppose the ranges of measured process variables 
under certain operating conditions and represent rough process models. If the ranges 
of the variables are increasingly made smaller, many rules would be required to de¬ 
scribe the process behavior. Then, it is better to use mathematical process models in 
form of equations to detect abnormalities. Therefore, plausibility tests can be seen as 
a first step towards model-based fault-detection methods. 


7.7 Problems 

1) What are the advantages in combining limit checking for absolute values and for 
trends? 
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2) The pressure p(t) in a water network has to be supervised at the end of a branch. 
Write the equations for estimating the mean and variance with a time window 
and exponential forgetting. 

3) What statistical tests can be applied for detecting changes in the mean and the 
variance of a stochastic variable Y(k)l Which parameters of the signals have to 
be known? 

4) Which of the test methods are applicable for on-line real-time fault detection? 

5) Compare binary and fuzzy thresholds with regard to the selection of thresholds 
and interpretation of the result. 

6) State the rules for a plausibility check of a pneumatic flow valve if the mea¬ 
sured variables are: manipulating air pressure p a i r , valve position U va lve and 
flow mflow The ranges of the variables are: p a j r = 0.2 ... 1.0 bar; U va i ve = 
1% ... 100%; m flow = 0.1... 10/« 3 /h. 



8 


Fault detection with signal models 


Many measured signals of processes show oscillations that are either of harmonic 
or stochastic nature, or both. If changes of these signals are related to faults in the 
actuators, the process and sensors, signal model-based fault-detection methods can 
be applied. Especially for machine vibration, the measurement of position, speed or 
acceleration allows to detect, for example, imbalance or bearing faults (turbo ma¬ 
chines), knocking (gasoline engines) chattering (metal grinding machines). But also 
signals from many other sensors like electrical current, position, speed, force, flow 
and pressure, contain frequently oscillations with a variety of higher frequencies than 
the process dynamics. 

The task of fault detection by the analysis of signal models is summarized 
in Figure 8.1. By assuming special mathematical models for the measured signal, 
see Chapter 6, suitable features are calculated as, for example, amplitudes, phases, 
spectrum frequencies and correlation functions for a certain frequency band width 
cd < co < co max of the signal. A comparison with the observed features for normal 
behavior provide changes of the features which then are considered as analytical 
symptoms. 

The signal models can be divided in nonparametric models, like frequency spec¬ 
tra or correlation functions, or parametric models, like amplitudes for distinct fre¬ 
quencies or ARMA type models. The following sections describe some signal- 
analysis methods for harmonic oscillations, stochastic signals and instationary sig¬ 
nals, compare the survey presented in Figure 8.2. For more detail see the special 
books on digital signal analysis, like [8.46], [8.12], [8.40], [8.29], [8.23], [8.28], 
[8.55], 


8.1 Analysis of periodic signals 

It is now assumed that periodic signals y(t) are superimposed on a steady state signal 
value l"oo such that the measured absolute signal is, see Section 6.1, 


no = n 0 + no 


(8.1) 
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Fig. 8.1. Scheme for the fault detection with signal models 



Fig. 8.2. Survey of signal-analysis methods for signal model-based fault detection 


If the steady state value is removed, the signal y(t) only has to be analyzed. Usually 
the periodic signal y{t) is composed of a usable signal part y u (t ) and a noise part 
n(t) 

vO) = y u (t) + n(t) (8.2) 

The usable signal y u (t) contains the dynamic signal components which have to be 
analyzed. The noise n(t ) is assumed to have zero mean and is uncorrelated with 

}’u(t). 

According to the theory of Fourier series, each periodic signal can be described 
by the superposition of harmonic components 

N 

y u (t) = ^2 yov e~ dvt sin (o) v t + (p v ) (8.3) 

V=1 

Each component is determined by the amplitude jov, the frequency o> v . the phase 
angle <p v and the damping factor d v . These parameters have now be determined by a 
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signal analysis method. In many cases it is sufficient for fault detection to determine 
co v and jov 

8.1.1 Bandpass filtering 

The classical method of obtaining the amplitudes of the harmonic components in de¬ 
pendence on the frequency, i.e. the frequency spectrum, it to pass the signal through 
a number of analog bandpass filters with different center frequencies, see Figure 
8.3, or a filter where center frequency is moved over a frequency range. The band¬ 
pass filters have a certain bandwidth like the width of transmission between three 
dB frequencies and the steepness of the flanks, expressed, e.g. in the frequency band 
for 60 dB attenuation. The filter may be realized analog or digital. In the last case 
the signals are converted in an A/D converter and then operated by recursive filter 
algorithms, allowing real-time filtering. The bandpass filters may have a constant ab¬ 
solute bandwidth or a constant relative (percentage) bandwidth. Constant bandwidth 
gives a uniform frequency resolution on a linear frequency scale. This is used if the 
considered frequency range is limited, as for example, for two decades. Constant 
percentage bandwidth gives uniform solution on a logarithmic scale and is used for 
wide frequency ranges of three or more decades. 

Commercially available signal analyzers allow to send the filtered signal to a 
detector if one is interested in the power spectrum. The signal is then squared and 
integrated over a certain time to obtain an average power value. If the square root of 
this mean square value is taken, a root mean square (RMS) amplitude is obtained. 

Figure 8.3 shows a scheme for a discrete-stepped bandpass filter analyzer. Here, 
the bandpass filters have different (stepped) center frequencies. The detector is con¬ 
nected sequentially to the filter outputs and measure the signal amplitudes or power 
in each frequency band. The output is then amplified and recorded by a pen recorder 
or printer. 

For narrow-band analysis it is more appropriate to use a single filter with tun¬ 
able center frequency. The filter can have constant bandwidth or constant percentage 
bandwidth. The frequency spectrum then results continuously in frequency. For more 
details see, e.g. [8.46], [8.43], [8.35], [8.49], [8.14], 

8.1.2 Fourier analysis 

If the sinusoidal phase shifted oscillations (8.3) are developed into a Fourier series, 
it holds 

N N 

y(f) = t + E a v cos v coot + ^ b v sin v coot (8.4) 

2 v=l v=\ 

If the frequencies v mo are known, the amplitudes can be determined by the Fourier 
coefficients 


L 


y(f) 


cos (vcoot) dt 


a 


2 

T~p 
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filter n 


Fig. 8.3. Scheme for bandpass filtering with stepped filters 

2 r Tp 

b v = — / y(t) sm (vco 0 t) dt (8.5) 

1 p Jo 

The Fourier series can be put in the complex form 

N N 

y(t) = C 0 + J2 C - eiVa>,> ‘ + C -» e ~ iVO>0t 

v= 1 v= 1 

oo 

= J2 c v eiva>ot ( 8 . 6 ) 

v =—OO 

with the complex Fourier coefficients 

c v (ivcoo) = [ y(t) e~ lv0)ot dt (8.7) 

1 p Jo 

For T p —> oo and thus u>q —> da> and vu>o = cm, the periodic function becomes a 
non-periodic function resulting in the Fourier transform y(icd), (8.13). 

8.1.3 Correlation Functions 

The autocorrelation function (ACF), respectively auto-covariance function, for sta¬ 
tionary zero mean disturbing and periodic signals is given by the general form 

i r T 

Ryy(r) = lim — / y(t) y(t + r) dt (8.8) 

T-+° o 1 Jo 


For periodic signals, one has to average over integer periods. Thus applies 
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| rnTp V 

Ryy ( r ) = lim ~=r~ / }’(t) y(t + r) 

«->oo nTpy Jo 


dt 


For a sinusoidal phase-shifted oscillation with co v = 2n/ T pv 
)’u(t) = yov sin (co v t + (p v ) + n{t) 


(8.9) 


( 8 . 10 ) 


the ACF becomes, [8.17], 


Ryy(x) = coso^r 


( 8 . 11 ) 


and thus again a periodic function. The result is independent of the phase shift cp v . 
The stationary disturbing signal components n(t ) and oscillations with (» ^ <o v for 
n —> oo have no influence on the ACF. Thus, the ACF is suitable for analyzing 
periodic signals with stochastic disturbing signal components. 

For the cross-correlation function between the input signal u(t) = u o sin o> t of 
a linear system and the output signal y(t ), it holds 


i/o f nTpv 

Ryu(x) = lim - / y(t) sin a> v (t + x) dt (8.12) 

n^oo nT pv Jo 

only oscillation components of y(t) with co = co v have an influence. One may con¬ 
sider the similarity to the Fourier coefficient b v , (8.5). 


8.1.4 Fourier transformation 

The Fourier transform of a non-periodic signal y(t) is defined as 

/ OO 

y(t)e~ imt dt (8.13) 

-OO 

To ensure its convergence, the following condition must be fulfilled 

/ OO 

|j(f)| dt < oo (8.14) 

-OO 

Figure 8.4 shows the amplitude densities \y(ia>)\ for some examples of finite pe¬ 
riodic signals, for which the convergence condition is fulfilled, [8.37], If the Fourier 
transform is applied to an oscillation of finite duration T , one obtains a peak at 
ft) = 0 ) v , Figure 8.4c. The longer the duration T , the higher and more narrow the 
amplitude density \y(ico)\ around u> = u> v . A steady state oscillation with T -> oo 
yields \y{iw)\. In this case, the convergence condition is no longer fulfilled. 

Sampling the continuous-time signal y(t) with the sampling time To results, 
from (8.12) in the case of y{t) = 0 for / < 0, approximately in 

OO 

y(ico) » To y(kT 0 ) e- imkTo 

k =0 


(8.15) 
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Fig. 8.4. Amplitude densities of some signals of finite duration: (a) decaying oscillation; (b) 
growing and decaying oscillation; (c) finite periodic signal; (d) time course of a rectangular 
Fourier transform 

Omitting the constant Tq yields the discrete Fourier transform (DFT) 

OO 

yo(ia>) = J2 - v(kT ^ e ~ imkT ° (8-16) 

k= 0 

Restricting its application only to a finite measuring interval 0 < k < N ** 1, then 
applies 

N-\ 

y D {io>) = y( kT o) e- imkTo 

k =0 

N-\ N—l 

= ^2 y(kTo) cos cokTo — i ^2 y(kTo) sin cokTo 
k= 0 k= 0 

= Re (yo(ico)) + i Im ( yo{ico )) (8.17) 

whereas the discrete amplitude spectrum can be calculated from 

I 

| yo(io))\ = ^Re 2 (y D (iM)) + Im 2 (y D (ico)) ' (8.18) 

and the discrete phase spectrum from 

a D (ico) = arc tg [I m(y D (ico))/Re(yD(i <*>))] (8.19) 


Introducing the abbreviation 
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z = e 


ia)To 


one obtains the r-transform 


( 8 . 20 ) 


N -1 

yp(z) = Y2 y( /cT o) z ~ k (8-21) 

k =0 

For each angular frequency co, 2 N multiplications and 2 (N — 1) additions are nec¬ 
essary. Therefore, the computing effort is relatively high. 


8.1.5 Fast Fourier transformation (FFT) 


A reduction of the long computing time of the DFT can be achieved by utilizing 
the cyclic properties of the oscillation pointers in the complex plane, [8.7], [8.5]. 
The analysis is then restricted to re-sorting data and multiplications with precalcu¬ 
lated sine and cosine values. The angular frequencies are as well discretized with a 
spacing of A co. With a finite measuring duration T = NTq, the lowest describable 
angular frequency is co m j„ = In /T = 2n/NT q and the highest angular frequency 
is co max = n/ To according to the Shannon sampling theorem. The frequency reso¬ 
lution becomes 

2 n 

co = nAco with A co = - (8.22) 

NTq 


Using (8.16) gives the series 


N-\ N -1 

yoiinAco) = y( kT ' b) e~ iknAMTo = yV<T 0 )W kn (8.23) 

k =0 k =0 

with constant data set-independent complex factors Wff 

W N = e~ iAo,T ° = e i2n/N = const. (8.24) 

The analysis of several data of equal length permits the factors WN" to be precalcu¬ 
lated and stored as sine/cosine tables for different lengths of data sets N in order to 
save computing time, e.g. for real-time processing. In addition, the following sym¬ 
metry characteristic is valid 

W N = e- i2 *! N = jr™) }' /2 = W^l (8.25) 

A decomposition of the sample set y(k) into two parts, one containing only even- 
numbered samples and the second containing odd-numbered samples 

Veven = y(2k) ; y odd = y(2k +1), k = 0... (y - 1) (8.26) 


yields two sub-sequences from (8.23) and (8.24) 
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N/ 2-1 N/ 2-1 

E }V-k)wffi_= E y(2k)wtf” 

k= 0 A:=0 

7V/2-1 7V/2-1 

E 7(2fc + l)W^/2= E yW + VWtf" (8.27) 

A: = 0 & = 0 

Thus, the entire series of the signal j(/r) can be denoted 

N/ 2-1 

W>(") = E {j(2^)^" + j(2fc+l)< U+1) "} 

k=0 

= yevenin') + y Q dd(n) (8.28) 

in such a way as to permit a calculation by formation of two sub-sequences, each with 
the half data set length. According to (8.28), the decomposition can be proceeded as 
long as the length of the sub-sequences is even. A perfect utilization of the calculation 
symmetry is possible if the length of the data set N represents a power of two, 
N = 2 V . In this case, the evaluation of (8.22) degenerates into a rearrangement of 
y(k) as well as into multiplications of the precalculable complex values Wn • This 
corresponds to a computing effort of each 4 N IgN/lg2 real multiplications and 
additions for this FFT. 

According to the order of the operations, one distinguishes between Cooley- 
Tukey algorithms (re-sorting with subsequent multiplication) and Sande-Tukey algo¬ 
rithms (multiplication of the data with sine/cosine values and subsequent re-sorting, 
[8.41]). 

A comparison of the computing effort between the DFT and FFT with fixed data 
set length N shows the strong increase of the DFT versus the FFT with increasing 
length N , Table 8.1. Due to the large savings in computing time, the disadvantage of 
a specific data set length (power of two) required for the FFT is usually accepted. 


yeven (ft) — 

yoddin) = 


Table 8.1. Comparison of required calculations for DFT and FFT 


Data set length 

| Required calculations for j 

N 

DFT 

FFT 


128 

33282 

256 

3584 

3584 

multiplications 

additions 

1024 

2101250 

2048 

40960 

40960 

multiplications 

additions 

4096 

33570818 

8192 

196608 

196608 

multiplications 

additions 


Example 8.1: 

The signal processor DSP32C (AT&T) needs a computing time of 80 ns with a clock 
frequency of f c = 50 MHz ( T, =20 ns) for one calculation step (floating-point ad- 
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dition and/or multiplication). Due to its specific architecture (multiplier-adder cas¬ 
cade), both an addition and a multiplication operation can be executed together 
within one calculation step. Regarding the calculation of the FFT, with the same 
number of both operations half of the computing time can be saved. However, with 
the DFT only minor effects are noticeable, see Table 8.1. 


Table 8.2. Comparison of the required computing time of a signal processor for DFT and FFT 


N 

DFT 

FFT 

Factor DFT/FFT 

128 

26.8 ms 

2.85 ms 

94 

1024 

1.68 s 

32.75 ms 

513 

4096 

26.86 s 

0.1575 s 

1705 


Table 8.2 shows that the factor of time-saving becomes larger with increasing 
data set length. 


□ 


Regarding the successive utilization of symmetries with the FFT, the data set 
length N is limited to a power of two numbers ( N = 2 V ). Often, N = 1024 = 2 10 
is used. Whenever a data set does not meet these requirements, it must be either trun¬ 
cated or padded with zeros (“zero padding”) up to the next power of two numbers, 
which means a corruption of the raw data. 

Another disadvantage of the DFT and FFT of a discrete spectrum arises from the 
selected representation of a discrete spectrum. Due to its finite curvature behavior, 
sharp spectral lines (“Peaks”), which may arise from discrete sinusoidal oscillation 
components, cannot be modelled accurately. A finite data set length in the time do¬ 
main (k = 0 ... N — 1) can be generated from an infinitely long data set length by a 
rectangular windowing function free, with 


frecif Tq) = 


[ 1 for 0 < k < N 
( 0 for else 


(8.29) 


see Figure 8.5. Instead of a sequence y(k ) of the real measured signal, a sequence of 
finite duration )’t r (k) is transformed into the frequency domain, which results from 
the multiplication of the signal and the windowing function 


ytr(k) = frec(k)y(k) 


(8.30) 


In the frequency domain, this operation leads to a convolution of the spectrum of the 
measured signal and the Fourier transform of the rectangular window, which forms 
the sinc-function dependent on the data set length N. The amplitudes of all discrete 
frequency components smear over by such a convolution performed over the entire 
frequency range (leakage effect), see, e.g. [8.23]. 

A reduction of the leakage effect can often be achieved by introducing specific 
windowing functions f w j n (k), which are multiplied with the measured signal prior 
to the transformation, as shown in Figure 8.6 
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y t r(k) = fwin(k)y(k) (8.31) 

These windowing functions f w i„ (k) continuously change their values from 1 in the 
middle of the measuring interval to 0 on the edges. Besides the reduction of the 
leakage effect, however, the disadvantage of a diminished resolution of the resulting 
spectrum due to the corruption of the signal values on the edges occurs. 



1 


I-1-► 

0 N k 

Fig. 8.5. Rectangular windowing function f re c(k) 



Fig. 8.6. Generalized windowing function f w j n (k) 


Example 8.2: 

The sinusoidal oscillation 


y(t ) = IF sin (2 :r • 1 Hz ■ t) 

is sampled (7o = 200ms) within the measuring interval k = 0 ...N — 1. The am¬ 
plitude spectrum is estimated by means of the FFT from the appropriate sampled 
sequence y(k): 

a) Without additional windowing function / w ;„ 


y(n A®) = FFT {f rec {k)y(k)} 
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From Figure 8.7, it becomes evident that the limitation of the measuring interval 
to N samples, particularly for small values of N(N = 128), leads to a significant 
estimation error of the signal amplitude and signal frequency. 

b) With additional windowing function (Hanning-window) 


y(n Aco) = FFT {f W in(k)frec(k) y(k)} 
fwinik ) = 0.5 11 - cos | 


The additional Hanning-window causes an improvement of the frequency and am¬ 
plitude estimation for small values N(N = 128), as illustrated in Figure 8.8. The 
influence of the leakage effect on the estimation results decreases with increasing 
data set length N. 

For complete details on using the FFT, refer, for instance, to [8.1], [8.5], [8.34], 
[8.47], [8.45], [8.23], [8.28], [8.55], 



0.8 0.85 0.9 0.95 1.0 1.05 1.1 1.15 1.2 

frequency /[Hz] 


Fig. 8.7. FFT of a sinusoidal oscillation with 1 Hz, y(k) for different data set lengths N 


□ 

If one is interested in the power spectrum of the signal y(k To) based on the 
discrete Fourier transform, one can first determine the discrete amplitude spectrum 
| yo(i (o)\, (8.16), by using the FFT, and the using the relation for the periodogram 


[8.24], [8.27], [8.15], 


1 , 
P y {n Am) = — \y D (n Atw)|“ 


(8.32) 
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Fig. 8.8. FFT of a sinusoidal oscillation with 1 FHz or different data set length N and additional 
Hanning-window 

8.1.6 Maximum entropy spectral estimation 

Most problems of the FFT could be solved if the course of the measured signal out¬ 
side of the measuring interval was known. For this reason, [8.6] searched for an ap¬ 
proach to predict the unknown signal course from the well-known measured values, 
whereby no a priori assumptions concerning the signal course should be made. This 
estimation of the values with maximum uncertainty concerning the signal course led 
to the term maximum entropy and to a substantially improved spectral estimation 
and is especially suitable for fault detection with some few frequencies. 

a) Parametric signal models in the frequency domain 

As an approach for a parametric signal model in the frequency domain, a fictitious 
form filter F(z) resp. F(iu >) is used, which is stimulated by the Kronecker-Delta 
pulse 

w 0 *<-->= 1 <8to 

to generate steady state oscillations y(k), Figure 8.9 The dynamic system behavior 
of the form filter is to be determined in such a way that it yields 

y( r) = F(z)8(z) = F(z) with <5(r) = 1 (8.34) 

to give an identical frequency response of the form filter F(z) and amplitude spec¬ 
trum of the measured signal y(z). This applies similarly to the power spectral density 

Syy(oj) = | F(ico)\ 2 Sss((o) = |F(/&>)| 2 with Sss(co) = 1 V co (8.35) 

Three possible parametric model approaches of such filters can be distinguished, 
[8.4], The MA (moving average) model is 
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Fig. 8.9. Generation of steady state oscillations y(k) by means of a fictitious form filter F{z) 
and stimulation with a 5-impulse 


Fma(z) — Po + Pi z 1 + ... + p n z " (8.36) 

Here, the signal spectrum is approximated by a polynomial of limited order n. Thus, 
the spectrum can only represent limited variations of the amplitude and is unsuitable 
for the modelling of periodic signals, whose amplitude spectrum only consists of dis¬ 
crete peaks. In the time domain, an MA approach corresponds to the filter-difference 
equation 

y(k) = p 8 (k) + pi 8 {k - 1) + ... + p n 8 (k - n ) (8.37) 

A pure autoregressive (AR) model 


Far(z) = — - - -— (8.38) 

is able to approximate sharp spectral lines of periodic signals according to the poles 
of the denominator polynomial. Thus, it is particularly suitable for estimating the 
spectra of harmonic oscillations, [8.36]. The corresponding filter-difference equation 
is 

Po 8 (k) = y(k ) + ai y(k - 1) + ... + a n y{k - n) (8.39) 

After a single stimulation by a 8 -pulse, the following course of y(k) is only depen¬ 
dent on its passed values y(k — i). 

The mixed (ARMA)-filter model approach 


Farma (-) = 


Po + Pi z 1 + ... + Pp z p 

1 + o'! z~ l + ... + a n z~ n 


is given by the ARMA-difference equation 


(8.40) 


y(k) + ct\ y(k — 1) + ... + a n y(k - n) = Po8(k) + ... + P p 8(k - p) (8.41) 

and is composed of both the MA and AR model approach. 

However, strong convergence problems arise with regard to the estimation as the 
consequence of doubling the number of parameters ( Pj, ai ). Regarding this type of 
model, more complex and more special estimation procedures are described in the 
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literature, see [8.26], which, however, yield worse results for periodic signals (mainly 
autoregressive signal components) compared to an AR model approach. 

A model structure of the form filter F(z) can generally be derived also for pe¬ 
riodic signals by the method of maximizing the entropy in the form of a pure AR 
model for the power spectral density S yy (z), [8.8], [8.48] 


Syy(z) = F(z) F(z~ 1 ) • S M (z) = f'° - ^ (8.42) 

|1 + 2_W = 1 a i z I 

By estimating the coefficients a,- and Pq from the measured signal y(k), one obtains 
a parametric, autoregressive model in the frequency domain for the power spectral 
density Sy y {tt>), which is characterized by ( n + 1) parameters (typically: n = 4 ... 
30). 

b) Determination of the coefficients 

A suppression of stochastic signal components with respect to the estimation can be 
achieved, if instead of the signal y(t) 

m 

y(t) = ^2 yov e ~ dvt sin (^v t + cp v ) (8.43) 

v= 1 

its autocorrelation function R yv ( r) is used 

m 2 

R yy (r) = E {y(t) y(t + r)} = ^ ^ e~ d " z cos(ft>„r) (8.44) 

V = 1 

Since the autocorrelation function R vy (r) of a periodic signal y{t) yields again a 
periodic function in r of the type (8.43), for this function a form filter model given 
by (8.34) can be assumed as well. 

Ryy(z) = F(z) Hz) (8.45) 

The eigen-behavior of y(t), represented by its m characteristic frequencies co v and 
damping coefficients d v , is also contained in R yy (r). However, the phase informa¬ 
tion gets lost and the amplitudes of the ACF become 

Rov = 0.5 Vq v (8.46) 

The approach of a general form filter model of the ARMA-type (8.41) yields a filter- 
difference equation for the ACF of the measured signal with m eigenfrequencies and 
thus of the order n = 2m 


R yy {r) = —oq R yy (r — 1) — a 2 R yy (r - 2) — ... — a„ R yy (r — 2m) 

+ Po Rss( r ) + Pi Rss(j — 1) + .. ■ + Pim-i RssH — 2 m + 1) 

+ Rnnir) (8.47) 
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if an additive-affecting, uncorrelated disturbing signal n(t ) of zero-mean value is 
taken into account. Its autocorrelation function R nn (r) 

gives only a constant contribution for r = 0. 

The ARMA-signal model (8.47) reaches the steady state after r = 2m steps. In 
the model (8.47), all /3-parameters are then omitted. From this time, the ACF R yy (z) 
proceeds in the form of a stationary steady state oscillation and can be exclusively 
described by the AR-part of (8.47). 

In the case of the AR model, the eigen-behavior of the autocorrelation function 
can be expressed by 


Rnn{ r) = R yy (r)+ai Ryy{x-\)+a 2 R yy (r—2) +.. , + u n R yy (x-2m) (8.49) 


This relationship yields a system of equations for different shifts r for the determi¬ 
nation of the coefficients aj and «o- 


~Ryy( 0) Ryy( 1) 

. R yy {2m) 


1 


«0 

Ryyi 1) Ryyi 0) 

. Ryy(2m — 1) 


Oil 

= 

0 

R yy (2m) R yy (2m — 1) . 

1 

o 

; 


m 


0 


The coefficient n o is a measure for the mean-square model error 

Rnn(0) = n 0 = E{n 2 (k)} = E |[j(Ar) - j>(/c)] 2 J (8.51) 

with v(k) as the model prediction for y(k), (8.38). To resolve the system of equation 
(8.50), estimates of the ACF R yy (r) for r = 0...2 m have to be determined from the 
measured signal sequence y(k), k = 0 ... N — 1. 


Ryy(j) 


1 

N-\x\ + l 


E 


y{k) v(k + r) 


(8.52) 


For an efficient solution of the equation system (8.50), the Burg algorithm, [8.41], 
is recommended. Here, the signal model (8.49) is interpreted as a predictor filter for 
the unknown autocorrelation values R vy ( r). Starting from the order of m = 1, the 
coefficients and values of the autocorrelation function are estimated alternately up to 
the final order of 2m, [8.6]. 

The eigenfrequencies of significant discrete oscillation components in y(t) are 
calculated by pole decomposition of the denominator polynomial in the AR-signal 
model (8.38) 


A*(z) = z 2m A(z ') = z 2m 11 + ai z 1 + a 2 z 2 + ...-|- a 2m z 2 "'| 


m 

= ]~[ (1 + ct\ v z + a lv z 2 ) 

V = 1 


(8.53) 
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or by searching the maximum in A*(z). The resulting conjugate-complex poles z v 
permit a factorization into square parts, (8.53) 

ol 1v z 2 + u\ v z + 1 = a 2v (- - z v \) (z - z v2 ) = 0 (8.54) 

From a corresponding table of the r-transform, e.g. [8.16], one obtains for each 
conjugate-complex pair of poles (z v i, z v2 from (8.53)) the angular frequency o> v 
of the appropriate sinusoidal partial oscillation y v (t) in y(t) 

1 

co v = — arccos 
T 0 

In this way all significant partial oscillation frequencies <o v of the measured signal 
y(t) are calculable. 

c) Estimation of the amplitudes 

A determination of the amplitudes from the AR-signal model (8.49), (8.50) is only 
inaccurately possible. The amplitudes yov of each partial oscillation of a significant 
eigenfrequency z v result with (8.42) from 

s ”'- ) = \Jk)P <8 ' 56) 

with A(z v ) = A*(z v ) given by (8.53). They are dependent on the denominator co¬ 
efficient o'/ and the constant numerator coefficient fio- Here, the slightest estimation 
errors of the coefficients result in large variations of the amplitudes. For this rea¬ 
son, a second estimation level is particularly performed to determine the amplitudes, 
[8.33], [8.31], 

In the autocorrelation function of periodic oscillations 

m 2 

Ryy(r) = E { y(t ) y(t + r)} = ^ ^ e~ dv% cos (co v x) (8.57) 

V=1 

the damping term can be neglected for small damping values 

m 2 

Ryy(j) = E { y(t ) y(t + t)}=^2 cos(o) v r) (8.58) 

v=l 

without obtaining a noticeable influence on the accuracy of the estimation result. 
With known eigenfrequencies co v provided by the first estimation level, one obtains 
a system of equations for determination of the appropriately demanded amplitudes 


of the autocorrelation Rq v 



' ^vr(l) " 


cos^! To) cos(co 2 T 0 ) ... cos(co m To) 


Roi 


Ryy (2) 


cos(mi27o) cos(co 2 2To) ... cos(m m 2To) 


Roi 


1 

: 

_i 


cos(mi;«7o) cos(co 2 mTo) ... cos(m„,m7o) 


Rom 


-«lv 

I 2Ja^, I 


(8.55) 


(8.59) 
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In consideration of (8.46), the signal amplitudes can be evaluated by means of the 
amplitudes of autocorrelation with 

yov = a/2 Rov (8.60) 

by inverting the (non-singular) matrix (8.59). 

Thus, a parametric model representation for the demanded power spectral density 
Sy V (o>) of the measured signal y(kTo) was found, which represents the spectrum by 
a parametric AR model respectively by frequencies of significant sinusoidal oscil¬ 
lation components with appropriate amplitudes. For applications only the sampling 
time To, the data set length N and the order m of the expected significant partial 
oscillations must be given. 

Figure 8.10 shows the amplitude spectrum of the current of an asynchronous 
motor of a hacksawing machine for nine frequencies. A worn-out saw blade leads to 
higher frequencies of the motor current. Further examples are given in [8.32]. 

The advantage of this modified maximum entropy spectral estimation is that the 
amplitudes jov of the oscillations and the frequencies are directly estimated. This 
means that instead of a distribution of many amplitudes over the frequency range, 
like obtained with a FFT, only some (e.g. 5) distinct frequencies are determined 
precisely and their amplitudes are given. Therefore, this method is advantageous for 
fault detection based on periodic signals. Applications for grinding machines are 
given in [8.22] and [8.21], see also [8.18]. 

amplitude motor current [A] 



frequency [Hz] 


Fig. 8.10. Estimated amplitudes y 0v of the current of an asynchronous motor of a hacksawing 
machine for an intact and worn-out saw blade, m = 0, N = 100 


8.1.7 Cepstrum analysis 

For the detection of harmonic signals with small amplitudes among harmonics with 
larger amplitudes the cepstrum may be used. There exist different definitions for the 
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cepstrum, [8.43]. In the case of periodic signals y(t) the power cepstrum is defined 
as the inverse Fourier transform of the logarithm of the power spectrum of the signal 

v(0- 


i r°° 

C yy (r) = {log P(co)} = — / log P(co) e ,C0T d co (8.61) 

2jc J—qq 

The power spectrum results from the representation of periodic functions as Fourier 
series with the Fourier coefficient at discrete frequencies v coq, (8.7) 


c v (i vco o) = — f y(t) e Iva>ot dt v = 0,1 (8.62) 

Tp Jo 

with the period T p (e.g. T p = 1/coq). Therefore periodic signals result in a discrete 
amplitude spectrum with dimension [amplitude] and phase spectrum. To obtain the 
power content of each harmonic, one has to take the square of the amplitude of the 
Fourier coefficient 

P(co) v = vo>0 = | c v (i v too)1 2 (8.63) 

now with dimension [(amplitude) 2 ]. The evaluation of (8.61) is then performed by 
sampling P(co). 

The complex cepstrum is defined as 


1 C°° 

Cy(r) = T 1 {log y(i co)} = — J 


log y(i co) e Ia>t d co 


(8.64) 


where y(i co) is the Fourier transform of the signal y(t). 

The cepstrum was probably first defined as a “spectrum of a logarithmic spec¬ 
trum”, [8.3], as a better alternative to autocorrelation functions for the detection of 
echoes in seismic signals. Presumably because is was a spectrum of a spectrum, [8.3] 
coined the word ceps-trum by changing the word spec-trum. Also other terms were 
created like que-frency from fre-que-ncy or saphe from ph-as-e. Therefore C yy (r) is 
is a function of the quefrency r with dimension [s]. 

Note that the autocorrelation functions (ACF) of a stationary stochastic signal is 
obtained by, compare (6.21), 


1 

Ryy( T) = {SyyO CO)} = — / Syy(l CO) f^d CO (8.65) 

ATC J —oq 

where S vy (i co) is the power density. If FFT-analyzers are used, first the spectrum is 
determined and then by the inverse Fourier transform the ACF instead of operating 
in the time domain, [8.43], [8.35], 

The advantages of a cepstrum with respect to an autocorrelation function are 
to be seen if a comparison is made between the power spectrum with dimension 
[(amplitude)] 2 of a signal in linear scale and logarithmic scale. The power spectrum 
in the logarithmic scale shows much more peaks compared to the linear scale, where 
only the main harmonics dominate. This is because the logarithm attenuates small 
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values in comparison to large values. Therefore the cepstrum shows the harmonics 
with their periods in [s] for small amplitudes better, as shown in an example for a ball 
bearing fault in [8.43]. Hence, the cepstrum is suitable for detecting periodic effects 
in the logarithmic spectrum, like families of harmonics, sidebands or echoes. It is 
used, e.g. in speech analysis, seismic analysis and machine fault detection, see also 
[8.25], 

The complex cepstrum is mentioned to be more powerful than the power spec¬ 
trum, but is more difficult to deal with, [8.43]. It contains also a phase information 
and it is possible to return to the original time domain. Applications are, for example, 
echo removal and speech synthesis. 


8.2 Analysis of non-stationary periodic signals 

Many signals have no constant frequency spectrum but change their frequency con¬ 
tents over time. These non-stationary signals should not be analyzed with the conven¬ 
tional Fourier transform, because then only averaged results are produced which are 
not associated to particular time instants. Two approaches for the analysis of these 
non-stationary periodic signals are considered in this section the short-time Fourier 
transform and the wavelet transform. 

8.2.1 Short-time Fourier transform 

A straightforward way to obtain the frequency spectrum of a time-varying periodic 
signal y{t) is to apply the short-time Fourier transform (STFT) 

/ OO 

y(t) fit — r) e~ ,cot dt (8.66) 

-OO 

where / (t — r) is a window function around the time r of interest. The STFT cal¬ 
culates the similarity between the signal y{t) and the function f(t — r) exp (i<»t). 
The function fit — r) has usually a short-time duration. By changing r the STFT 
describes how the frequency content evolves over time. In selecting the length of the 
window function fit — r), always a compromise between the resolution At of the 
signal and the resolution A a> of the spectrum has to be made, because of the un¬ 
certainty condition, [8.42]. If, e.g. a small resolution of the spectrum is desired the 
window length must be long. 

8.2.2 Wavelet transform 

The STFT determines the similarity between the investigated signal and a win¬ 
dowed harmonic signal. In order to obtain a better approximation of short-time signal 
changes with sharp transients, also the similarity with a short-time prototype function 
of finite duration can be calculated. Such prototype or basis functions which show 
some damped oscillating behavior are wavelets which origin from a mother-wavelet 
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[8.42], [8.2], [8.52], Figure 8.11 shows some typical mother-wavelets. These 
mother-wavelets can now be time-scaled (dilatation) by the factor a and time-shifted 
by t (translation) and leads to 

^*(t,a,x) = 4=^ (8-67) 

y/a \ ci / 

(The factor 1 / yfa is introduced in order to reach a correct scaling of the power- 
density spectrum, [8.2]). If the mean frequency of the wavelet is u>q, the scaling the 
wavelet by t/a results in the scaled mean frequency cdq/ci. 




Fig. 8.11. Mother-wavelet (a) Haar; (b) Daubechie 2nd order; (c) Mexican hat 


The continuous-time wavelet transformation (CWT) then becomes 

CWT(a, r) = ~^= f j’(t)'!' ( ——^ dr (8.68) 

y/a J —oo V ^ / 

which is a real function for real v(t) and 'T(t)- Note, that the STFT is usually a 
complex function. Examples for wavelet functions are 

1) Morlet-Wavelet 

<1 >{t) = e - f2/r2 e 2nfoit 
V(t) = (1 - t 2 ) e~‘ 2/2 


2) Mexican-hat wavelet 
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3) One-cycle-sine wavelet 


The advantages of the wavelet transform stem from the signal adapted basis function 
and the better resolution in time and frequency. The wavelet functions correspond to 
certain band pass filters, where, for example, by a reduction of the mean frequency 
through the scale factor also a reduction of the bandwidth is achieved, compared to 
STFT where the bandwidth stays constant, [8.50]. 

Example 8.3: 

A periodic signal with changing frequency is considered and some overlap between 

( cos(2rr/i?) t/To <400 

y(t ) = < cos(2tt/i?) + cos(2tt f 2 t) 400 < t / To < 724 
[cos(2tt/2?) f/7o>724 

/i = 80 Hz; f 2 = 240 Hz; T 0 = 125/xs 

For the wavelet transform the wavelet Mexican hat was used, [8.50]. 

Figure 8.12b shows the time-shifted wavelets for a = 9 and a = 25. Figure 
8.12c presents the results of the CWT, indicating the amplitude for different a and r. 
Figure 8.12d and e show the calculated wavelet coefficients 4>(£z, t) in dependence 
on the time for a = 9 and 25. The maximum values of T(«, t) are marked with 
a square and circle and determine the best agreement between the analyzed signal 
y(t) and the wavelets. Hence, this example shows that the changed frequencies can 
immediately be detected by taking the maximal values of the CWT. 

□ 

Sudden changes in the amplitude or frequency of periodic signals can therefore 
be detected after the CWT exceeds certain thresholds. An application of the wavelet 
transform for the misfire detection of a 6 cyl. gasoline engine with measured exhaust 
gas pressure and wavelet analysis was demonstrated by [8.50]. 


8.3 Analysis of stochastic signals 

The analysis of stochastic signals applies the signal model equations described in 
Section 6.2. 

8.3.1 Correlation analysis 

For single signals the autocorrelation function (ACF) is used, mostly for sampled 
signals in form of 
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(a) 



time t!t 0 , (translation b) 


Fig. 8.12. Application of wavelet transform to periodic signals: (a) Signal to be analyzed; (b) 
Scaled and time-shifted Mexican hat wavelets; (c) Scalogram 'I'(a,r) for a = 1...41 and 
t = 0...1000 (bright means large amplitude) (d) Time behavior of wavelet coefficient 'I'(a,r) 
for a = 25; (e) Time behavior of wavelet coefficient r) for a = 9 


4>yyi z ) = X! yW y(k + T) (8 - 69) 

k=0 

or if the mean x is known in form of the auto-covariance function 


Ryy( z ) 


N -1 

- y(k) y(k + r) - y 2 


(8.70) 


For the following discussion it is assumed that y = 0. As for finite N only N — \r\ 
products exist (8.70) leads to a bias b 



Ryy( z ) = Ryyiy) + b 


(8.71) 


which vanishes for N —> oo, such that the estimate is consistent. An alternative 
would be to use N — |r| instead of N in the nominator. Then, however, the variance 
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of the estimates increase, see [8.17]. Therefore, (8.70) is preferred and N should be 
sufficiently large compared to | r |. 

For on-line application in real time, the ACF can also be turned into a recursive 
correlation estimator 

R yy (r, k) = R yy ( r, k - 1) + [y(k - r )y(k) - R yy ( r, k - 1)] 

new old correction new — old (8.72) 

estimate estimate factor product product 

see [8.17], 

8.3.2 Spectrum analysis 

As the power spectral density is defined as Fourier transform of the auto-covariance 
function, the result of Section 8.1.4 can be applied. Instead of signal y{kT$) the 
auto-covariance function R y v (r) has to be used. Hence the power spectral density 
becomes 

OO 

S y y(oj) = Ryy(r)e~ io,zTo (8.73) 

T = —OO 

which is the two-sided discrete Fourier transform, compare (6.21). As R vy ( r) is 
symmetric for large measurement from N also 

OO 

Syy(aj) = 2 R yy( x ) e- icorT ° (8.74) 

T = 0 

can be used. Then, the fast Fourier transform (FFT), Section 8.1.4 can be applied 
directly. 

8.3.3 Signal parameter estimation with ARMA-models 

If the autoregressive-moving average (ARMA) model (6.30) 

y(k) + c\y{k - 1) + ... + c n y(k - p) 

= d 0 v(k) + d\v(k — 1) + ... + d m v(k — p) (8.75) 

is applied, the parameters cy and dj have to be estimated. Assuming the order p as 
known, the ARMA model can be written as 

y(k) = f T (k)0(k — 1) + v(k) (8.76) 

ir T (k) = [—y(k - 1)... - v(k - p)v(k - 1)... v(k - p)] (8.77) 

0 =^ci...Cp (8.78) 


where 
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If the white noise values v(k — l)...v(k—p) were known, the recursive least squares 
method (RLS) could be used, see Section 8.1. v(k) can be interpreted as equation 
error, which is statistically independent by definition. 

For the time of the measured v(k) the values v(k — 1), ..., y(k — p) are known. 
Assuming that the estimates v(k — 1),..., v( k — p) and 0(k — 1) are known, the 
most recent signal v(k ) can be estimated using (8.76) 

v(k) = y(k) — (k)O(k — 1) (8.79) 


where 


ijr ( k ) = [— y(k — 1)... — y(k — p)v(k — 1)... v(k — p)\ (8.80) 


Then 


f T (k + 1) = [—y(Ar)... - y(k - p + l)fi(fc) ...v(k - p + 1)] (8.81) 

is also determined, such that the recursive LS parameter estimation algorithms, Sec¬ 
tion 8.1, can be used. For more details see [8.19], This method can be applied for 
stationary signals and also for special types of non-stationary signals. 

This chapter on fault detection with signal models has shown how certain fea¬ 
tures of measured signals can be generated. According to Figure 8.1 the observed 
features are compared with the normal features. If the differences exceed a thresh¬ 
old, the exceeding values represent symptoms. The exceeding of these thresholds 
can be detected by the methods of limit checking and change detection, described in 
Chapter 7. 


8.4 Vibration analysis of machines 

8.4.1 Vibrations of rotating machines 

Many machines contain drive systems with motors, clutches, gears, shafts, belts or 
chains, and different ball/rolling or oil bearings. Vibrations are usually generated by 

• inherent machine oscillations (e.g. piston-crankshaft, toothed machine tool cut¬ 
ting, axial piston pumps, induction motors); 

• shaft oscillations with radial or axial displacement; 

• irregular speed of the shaft (e.g. Kardan joint or excentric gears); 

• torsional shaft oscillation; 

• impulsewise excitation (e.g. through backlash, cracks, pittings, broken gear 
teeth). 

Some of the vibrations indicate a normal state of the machines. However, changes of 
these vibrations and additional appearing ones may be caused by faults. Therefore 
vibration analysis is a well established field in machine or monitoring supervision, 
[8.43], [8.54], [8.53], [8.25], [8.13], 
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Machine vibrations are usually measured as acceleration a(t) with lateral ac¬ 
celerometers in one, two or three orthogonal directions (horizontal, vertical and ax¬ 
ial or rotational accelerometers) at the casing of the machines. Therefore, there is 
a machine transfer behavior between the source of the vibrations and the measure¬ 
ment location. This transfer behavior, expressed by a frequency response G m (ico), 
may contain one or several resonance frequencies co re s,i of the machine structure, 
resulting from different mass-spring-damper systems. 

The measurement principle of the accelerometer is based, e.g. on the measure¬ 
ment of forces, as for piezoeletric force sensors, or the measurement of the displace¬ 
ment of a seismic mass, as with inductive sensors. Usually a highpass filter follows 
the accelerometer in order to damp low-frequency disturbances with a cut off fre¬ 
quency of, e.g. 100-200 Hz. 

Instead of the acceleration a(t) also the vibration speed v(t) or the vibration 
displacement d(t) can be measured. If, for example, a sinusoidal oscillation of the 
displacement 

d(t) = do sin cot 

is considered, the other signals are related by 

v(t) = d(t) = do co cos cot (8.82) 

aft) = d(t) = —do co 2 sin cot (8.83) 

Therefore, higher frequent components are better represented by the measurement of 
the acceleration. Acceleration is also more easily measured as speed or displacement, 
which both need a non-oscillating reference point. However, also high frequent noise 
is amplified which may reduce the signal-to-noise ratio. This requires appropriate 
low pass filtering for the very high frequencies, see Figure 8.13. 

In the following, first the modelling of vibration signals is considered, especially 
the resulting frequencies caused by bearing and gear defects. Then some applicable 
vibration analysis methods are considered, based on the earlier sections of this chap¬ 
ter. In general, the analyzed vibration signal is denoted by y{t), which stands for 
a(t), v(t) or d(t). 

8.4.2 Vibration signal models 

Faults in rotating machines may generate additional stationary harmonic signals or 
pulsewise signals. Harmonic signals arise because of linearly superimposed effects 
like unbalance, inexact alignment or centering or deformed shafts, tooth faults, ball 
bearing faults, electrical flux differences in electrical motors, or by changes of the 
machine’s periodic operation. The resulting harmonic signals may then appear as 
additive vibrations, like 


n 

y(f) = y i(0 + yi(t) + • • • + y n (t) = Y ytU) 

i= 1 


(8.84) 



signal models 
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'ibration analysis methods for detection and diagnosis of rotating machinery faults (bearings, gears; 
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Typical frequencies in machines with gears and ball or roller bearings are: 

a> s basic rotational shaft angular frequency 
co t tooth contact frequency co t = z co s (z: teeth number) 
co or outer race frequency (frequency through rolling over an uneven point of the 
outer ring) 

a>i r inner race frequency 

(i>c cage frequency 

co roller or ball spin frequency 

The ball or roller frequencies can be calculated from geometrical data, see, e.g. 
[8.53], [8.13], [8.9], under the assumption that no slip occurs. However, with sig¬ 
nificant thrust loads and internal preloads these frequencies change because of other 
contact angles and slip, [8.54]. 

As shown is Section 6.1, additional frequencies appear also through nonlinear 
effects like nonlinear characteristics, backlash, mechanical looseness of the machines 
itself or its parts because of foundation cracks or broken mounting or by hysteresis 
through dry friction and slip-stick effects. 

Another reason for additional frequencies are amplitude modulations of a ba¬ 
sic waveform. A typical example is a pair of gears where one gear is not centered 
precisely. If the non-centered gear rotates with angular frequency co\ and has 22 
teeth, the teeth contact frequency is 22 impacts per revolution, resulting in angular 
frequency co 2 = 22 co 1 and amplitude jo 2 - Because of the non-centered gear, the 
amplitude of the tooth impacts oscillate with the speed co\ as the gear moves closer 
and farther away from the second gear resulting in oscillating contact forces. Hence, 
the frequency with which the level of the impacts are modulated (changed) is the 
rotation frequency co\ of the non-centered gear with amplitude X 01 , see Figure 8.14. 
This yields as resulting oscillations 

y(t) = V 02 sin co 2 t [1 + Tot sin (opt + cp x j\ 

= V 02 sin ao 2 t + >’01 V 02 \ I sin (( w 2 - «i) t - cpi) (8.85) 
-1-sin ({co 2 + (Oi) + cp\)} 

Then the tooth contact frequency 002 can be observed and the two sideband frequen¬ 
cies 002 — co 1 and CO 2 + co\, see Figure 8.14d. The rotation frequency coi does not 
appear. 

In addition the non-centered gear may also cause a frequency modulation , be¬ 
cause its effective radius changes as it moves closer and farther from the other gear. 
Therefore the frequency a >2 of the tooth contact oscillates, resulting in 

y(t) = V 02 sin [ C 02 + V 01 sin co\ t] t (8.86) 

Therefore also here sidebands with same frequencies as for amplitude modu¬ 
lation arise, [8.10]. Similar effects with sideband effects are observed for rolling 
element bearings ,v 1 and electrical motor-bar defects. 

Another form of vibrations appear as periodic impulses, for example, in ball or 
rolling bearings. The impact pulse is generated every time a ball or roller hits a defect 
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Fig. 8.14. Vibration signals and Fourier spectrum for a non-centered gear by excentricity e: 
(a) scheme of a one-stage gear with excentricity; (b) gear with teeth contact frequency a > 2 ; (c) 
non-centered gear with rotation frequency a>i; (d) amplitude-modulated teeth contact frequency 


in the raceway or every time a defect in a ball hits the raceway. Each such impulse 
excites a short transient vibration in the bearings and the mechanical structures at 
its natural frequencies, for example, rigid body frequencies, see Figure 8.15a. Con¬ 
sidering one dominating mass-spring-damper system with eigenfrequency a>o e , the 
momentum of each impact generates a damped impulse response of the position d (t ) 
at the measurement location 


do(t) = ao e St sin a>o e t (8.87) 

This damped vibration with decaying constant 8 repeats with period Ti, np or angular 
impact frequency o>i mp = 2nj Tj mp . The resulting signal is then described by 
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d(t) = T, d 0 (t - v T imp ) (8.88) 

V 

and its Fourier transform becomes periodic 

!F(i(D) = T (i (co ± v coi m p)) , v = 0,1,2,... 

as known from sampled-data systems. Hence, a frequency spectrum shows peaks at 
frequencies v a>j mp . 

The described signal models for rotating machinery are valid for single ball/rolling 
bearings and single gear stages. If several different bearings and gears act together 
the vibration effects are added and create several distinct frequencies by linear super¬ 
position and nonlinear effects, also resulting in several sideband frequencies. Hence, 
the frequency spectra become increasingly complex and it may not be straightfor¬ 
ward to isolate the effects and to diagnose the faults. 


(a) 






(b) 



Fig. 8.15. (a) Vibration x(t ) caused by periodic impulses; (b) Envelopes \xp{t)\ after low pass 
filtering of x(t) and rectification 



8.4.3 Vibration analysis methods 

The goal of vibration analysis is to extract features from the measurements x(t) or 
x(kTo), Tq sampling time, to be used for fault detection and diagnosis. The sampling 
frequency wq = 2n /Tq has according to Shannon’s sampling theorem to be selected 
with u>o > 2 a> max where co max is the highest frequency of interest. In order to avoid 
anti-aliasing effects anti-aliasing low pass filters have to be applied with a> cut = 
a>o/2 > co max . The corresponding analysis methods operate in the time domain, the 
frequency domain or in both the time and frequency domain, see, e.g. [8.54], [8.53], 
[8.25], [8.13], [8.9] and Figure 8.13. 

a) Time domain methods 

The application of autocorrelation functions (ACF) according to (8.9) or autoco¬ 
variance functions allows to separate the vibration signal from noise. The vibration 
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signals are represented as periodic functions R vy {t + v T p i), v = 0,1,2,..., 
for the harmonics i. However, interpretation by inspection is only possible for very 
few harmonics. Otherwise the ACF can be used as data preprocessing for frequency 
analysis methods. 

The power cepstrum according to (8.61) is defined as a Fourier-backtransformation 
of the logarithmic power spectrum with angular frequency <w[l/s] into the time do¬ 
main. It is used if harmonics with small amplitudes among dominating harmonics 
have to be detected, like for ball bearing faults, [8.43], [8.25]. Impulses from defects 
show as distinct peaks in the distance of one rotation time. 

For the detection of peaks and outliers the measured signal x(k) can be used for 
the calculation of a crest and kurtosis factor. 

A crest factor is defined as 


cr = max 


\-r(k)\ 

\j x 2 (k) 


and the kurtosis factor as 


kur 


Jf k (x(k) - -\-) 4 
sj(x(k) - x) 2 


(8.89) 


(8.90) 


Both factors process directly the measured vibration signal and weight peaks and out¬ 
liers strongly compared to the normal signal components, in order to detect changes 
or abnormalities. 


b) Frequency domain methods 

The classical way to analyze stationary harmonic vibration signals is the Fourier 
analysis described in Section 8.1.2, especially in the algorithmic efficient form of 
Fast Fourier Transform (FFT). The resulting peaks allow an intuitive way to extract 
changes in the frequency peaks caused by faults. Comparison with expected fre¬ 
quencies for, e.g. ball bearings and gears discussed earlier in this section then may 
allow to isolate respective faults. However, the observed frequencies are not always 
uniquely related to distinct faults, [8.53]. 

The maximum entropy spectral estimation. Section 8.1.6, is recommended for 
automatically finding some (2 to 5) distinct frequencies on cost of computations. 
Good results have been obtained for a grinding machine, [8.20] and hacksawing 
machine, [8.31], 

For the extract of damped impulse responses as result of impact impulses caused 
by ball/rolling bearings or toothed gears, the envelope analysis method is suitable. 
Here, the eigenfrequencies a>oe,v of the machine modes are suppressed by a low pass 
or band pass filter which include the eigenfrequencies. After determining the mag¬ 
nitude \xp(t)\ of the low pass filtered signal, only the positive part of the envelopes 
remain, see Figure 8.15b. This signal is then analyzed by a Fast Fourier Transform 



8.4 Vibration analysis of machines 141 


(FFT). By this way the impact frequency ajj mp and its higher harmonics are much 
better represented as in the FFT of the original signal x (?), where it may not be 
recognized because of the higher contributions of the machine modes. 

c) Time-frequency domain analysis 

Time-frequency analysis methods first perform some data preprocessing in the time- 
domain, to improve the signal-to-noise ratio or to improve the signal contents with 
regard to certain frequency ranges. 

A first way is to apply autocorrelation to remove noise effects or cross-correlation 
if the relation to another frequency is of interest and to analyze the resulting time- 
dependent periodic signals with FFT. 

For non-stationary periodic signals the short-time Fourier transform or the Wave- 
lettransform can be applied, described in Section 8.2. 

Vibration analysis can be used to detect faults in rotor systems. Through the 
identification of fault models with equivalent forces and suitable locations for posi¬ 
tion and acceleration measurements, it was shown by using a laboratory testing that 
imbalances and larger axle rents can be detected, [8.39], [8.38]. 

This summary of some basic methods for vibration analysis of machines, as de¬ 
picted in Figure 8.13, shows that the design of the various filters, the selection of 
sampling time and length of measurements, the applied analysis methods and their 
appropriate combinations together with the selection of the measurement equipment 
has to be performed with good knowledge of the individual machine properties. Ex¬ 
amples of successful applications are given in [8.53], [8.25]. A comparison of differ¬ 
ent methods in [8.9] shows relatively similar results and gives recommendation for 
practical cases. 

8.4.4 Speed signal analysis of combustion engines 1 

Increasing demands on economy, reliability and particularly the reduction of exhaust 
gas emissions is forcing vehicle manufactures to develop suitable detection and di¬ 
agnosis functions for combustion engines. Legal regulations, such as the On Board 
Diagnosis II (OBD II) introduced by the California Air Resources Board (CARB) 
in 1996 (California’s OBDII Requirements) or the European On Board Diagnosis 
(EOBD) introduced by the European Union in 1998, have promoted the develop¬ 
ment of supervision methods of all components in a passenger car that cause an 
increase of exhaust gas emissions in the case of faults. Statistics, [8.30], have shown 
that an increase in exhaust gas emissions and decrease in engine performance is, 
in most cases, caused by faults in the injection or mixture preparation. Regarding 
spark-ignition engines, misfire detection is a very demanding task. When a cylinder 
misfires, e.g. due to faults in the mixture preparation or ignition system with the ef¬ 
fect that no combustion or incomplete combustion occurs, unburned fuel enters the 
exhaust system, which then burns in the hot catalytic converter. The released heat 

1 compiled by Frank Kimmich 
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may damage or destroy the catalytic converter by thermal overloading. If a given 
misfire ratio is exceeded, the fuel supply for the misfiring cylinders can be cut off 
in order to protect the catalytic converter from damage and to avoid exceeding the 
emission standard. One way to detect misfiring cylinders is to evaluate the engine 
speed signal at the engine flywheel. 

The signal characteristics of a combustion engine are determined by the batch be¬ 
havior of the combustion, which depends on the crankshaft angle CA. Each cylinder 
of a four-stroke engine fires every 720°CA. This corresponds to one working cycle 
and specifies the engine base period. All relevant signal components are multiples 
of this base frequency. During a working cycle, each cylinder fires one time so that 
a combustion every 180°CA results for a four-cylinder engine. If the engine angular 
speed, measured at the flywheel, is denoted by coe , the frequency of this oscillation 
corresponds to the ignition frequency //by 

(X>E 

f\ = — i c , i c • number of cylinders (8.91) 

4 7T 

In Figure 8.16, a typical engine speed signal of a spark-ignition (SI) engine mea¬ 
sured at idle speed without misfiring is depicted showing speed oscillations with the 
ignition frequency around the engine speed mean value (approx. 800 rpm). 

If misfires or faults in the injection mass occur, the engine speed decreases sig¬ 
nificantly. Figure 8.17 shows the measured engine speed of a four-cylinder engine in 
the case of continuous misfiring of one cylinder. Then, additional low-frequency os¬ 
cillations arise, as can be clearly seen from the low-pass filtered engine speed signal. 
The appearing frequency components are harmonics of the engine base frequency. 
Depending on the misfiring cylinders, different frequency patterns result. 



Fig. 8.16. Measured engine speed signal at idle speed of an SI engine (no misfires) 
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Fig. 8.17. Measured engine speed signal and low-pass filtered signal at idle speed with misfires 
in cylinder 1 


In the past few years, methods have been investigated using the Fourier and the 
fast Fourier analysis to evaluate these frequency components, see [8.44]. Figure 8.18 
shows the Fourier transforms of both engine speed signals (no misfire and misfire in 
cylinder 1). Without misfire, the ignition frequency means that only the fourth engine 
harmonic appears in the spectrum. In the case of misfires, additional frequency com¬ 
ponents arise. Evaluating these frequency components means that not only misfires, 
but also the misfiring cylinder, can be detected and located. 



Fig. 8.18. Fourier transform of the engine speed signal without and with misfires in cylinder 1 
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Another method to be considered, [8.11], uses the real and imaginary compo¬ 
nents of the discrete Fourier transformation (DFT) applied to the engine speed signal. 
Thereby, a four-stroke four-cylinder engine shall be considered, whereas the princi¬ 
ple of the method was also successfully implemented in a six-cylinder spark-ignition 
engine up to 6000 rpm and for loads higher than 20%. 

Since the engine is time-variant, the data acquisition is performed crank angle 
synchronously so that no sampling time adaptation is necessary. For calculation of 
the DFT, the data is sampled all at 90° CA. This corresponds to the double igni¬ 
tion frequency, so satisfying the Shannon sampling theorem. The resulting speed- 
dependent sampling time for a four-stroke engine then follows from the ignition 
frequency: 

T 0 = 7.71 f\ = (8.92) 

The DFT evaluation can now be determined by using only eight sampling points 
per combustion cycle. Only a few sampling points N have to be taken into account, 
which is an easy real-time application. To compute the DFT, the amplitudes and the 
phase angle can be calculated as follows: 


A 


m — 


N 



(Oi COS 



(8.93) 


v - '—1 

E,=i «* sln 

(pm = arctan——- 

,=i *4 cos 

whereas m denotes the order. Because of the usually non-cyclic combustion varia¬ 
tions, an average value for several working cycles can be calculated from the mea¬ 
sured data. 

Faults to be taken into consideration are misfires or combustion differences in 
one or two cylinders. Six different patterns P have to be distinguished for the rela¬ 
tive location of the misfiring cylinders to each other: 


( 2 nmi \ 

N ) 


( Inmi \ 

N ) 


(8.94) 


P0: no fault 

PI: one cylinder oversupplies 

P2: one cylinder undersupplies 

P3: two subsequent cylinders undersupply 

P4: two oppositely cylinders undersupply 

PX: undetectable. 

To locate the misfiring cylinders, only the first and second engine harmonics (m 
= 1 and m = 2) have to be evaluated, see also Figure 8.19. Representing the real and 
imaginary components of these two frequencies, values equal to zero for no misfires 
and unequal to zero for misfires appear. For pattern recognition and misfire detection 
respectively, comparisons of the amplitude values and the real and imaginary com¬ 
ponents have to be performed. Also, two thresholds Tl, T2, which are dependent on 
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engine speed and load, have to be determined. The flowchart in Figure 8.19 shows 
the signal flow of monitoring and diagnosis of the possible fault patterns. Depending 
on the fault case, different vector patterns arise, with which the defective cylinders 
can be detected. Thus, the fault diagnosis is executed by a pattern recognition method 
of the amplitudes and phases of the DFT. 

The performance of the proposed method is, on the one hand, limited by faults in 
the data acquisition (for example, error in measurements) and, on the other hand, by 
overlaid disturbances on the measured signal. It can be used for misfire detection as 
well as for monitoring smooth engine operation for the whole operation area, except 
for too-low loads and high speeds. A similar approach was developed by [8.51] for 
measuring the exhaust gas pressure. 



Fig. 8.19. Scheme for detection of misfires and diagnosis of the faulty cylinders 
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8.5 Problems 

1) State three typical tasks for fault detection with signal models in the areas of 
machine tools, turbo engines and car suspensions. Which measurements should 
be made? Which signal-analysis methods can be applied? 

2) A passenger car shows strong vibrations in the steering wheel around 80 km/h? 
What can be the fault? Which variables should be measured to diagnose the 
fault? 

3) The seat of a truck drive is provided with an accelerometer. Which signal- 
analysis methods can be used to determine the sources of the observed fluc¬ 
tuations? 

4) State the advantage of the FFT compared to DFT. 

5) What are the advantages of the the wavelet transform compared to the short time 
Fourier transform? 

6) What is the lowest frequency of the speed signal of a 6 cyl. four-stroke combus¬ 
tion engine if one cylinder shows misfires (n = 3000 rpm)? 

7) What are the features for fault detection by using stochastic signals with corre¬ 
lation analysis and ARMA-parameter estimation? 

8) A rotating machine shows a basic displacement vibration with / = 5 Hz and 
amplitude 2 mm. Calculate the corresponding vibration speed v(t) and accelera¬ 
tion a(t). 

9) A pair of gear-wheels has 20 and 50 teeth and rotates with n = 3000 rpm for the 
20 teeth wheel. Calculate the tooth contact frequency f t and angular frequency 
w r . Which frequencies can be observed if the smaller wheel is not centered? 
Which methods can be used for the detection of these frequencies due to the 
non-centered fault if the acceleration at the gear casing is measured? 
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Fault detection with process-identification methods 


As described in Chapter 5 mathematical process models describe the relationship 
between input signals u(k) and output signals y ( k ) and are fundamental for model- 
based fault detection. In many cases the process models are not known at all or some 
parameters are unknown. Furtheron, the models have to be rather precise in order to 
express deviations as result of process faults. Therefore, process-identification meth¬ 
ods have to be applied frequently before applying any model-based fault-detection 
method. But also the identification method itself may be a source to gain information 
on, e.g. process parameters which change under the influence of faults. First publica¬ 
tions known on fault detection with identification methods are [9.24], [9.4], [9.26], 
[9.27], [9.28], [9.14] and [9.13], 

The chapter gives a brief introduction into the most important identification 
methods for linear and nonlinear processes with single-input single-output which 
are relevant for fault detection. Table 9.1 shows a survey of the most important iden¬ 
tification methods. 

In Figure 9.1 those methods are extracted which can be applied for a wide range 
of processes and excitation signals at the input. Especially for dynamic processes the 
input signals have to change periodically, stochastically or by special test signals. 
These signals may be the normal operating signals (like in servo systems, actuators 
or driving vehicles) or may be artificially introduced for testing (after fault detection 
with other methods or for quality control during manufacturing or maintenance). A 
considerable advantage of identification methods is that with only one input and one 
output signal several parameters (up to about six) can be estimated, which give a de¬ 
tailed picture on internal process quantities. The generated features for fault detection 
are then impulse response values in the case of correlation methods or parameter esti¬ 
mates. Other identification methods like step response model matching or frequency 
response measurement are only in special cases suitable for automatic fault detection 
because of the special excitation and evaluation procedure. 

The identification methods are first described for discrete-time and continuous¬ 
time linear processes, both in open loop and in closed loop and then for nonlinear 
processes. 



9.1. Survey of important identification methods TVS: time-variant systems; MIMO: multi-input multi-output systems, NLS: non-linear systems 
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Fig. 9.1. Survey of identification methods 


9.1 Identification with correlation functions 

If stationary stochastic signals act on a linear process in open loop, then the impulse 
response g(v) can be determined if the autocorrelation function of the input signal 
and the cross-correlation function of the input and output signal are known. 

9.1.1 Estimation of correlation functions 

The autocorrelation function (ACF) is defined by 

| Af-i 

<I>„ h (t) = E {u(k)u(k + r)} = lim — u(k)u(k + r) (9.1) 

N-*-o o N L —' 
k—0 

For finite samples of measured signals an estimate is given by 

1 N ~ l 

= — ^2 u(k)u(k + r) (9.2) 

k =o 

For the cross-correlation function (CCF) 

I ' v “> 

= E {u{k)y(k + r)} = lim — 'V' u(k)y(k + r) (9.3) 

N-+o o Jy *—' 
k =0 


the estimate is 
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^uy(r) = ^ X! u ( k )y( k + r ) (9-4) 

k—0 

Writing the CCF up to time points N and N — 1 and subtracting both equations, one 
obtains the recursive form 

® U y( r, N) = & uy ( r, W - 1) + [u(k - r)y(k) - 4> uy (r, IV - 1)] 

new = old + correction new — old 
estimate estimate factor product estimate 

( 9 . 5 ) 

For finite N the correlation function estimates contain a bias. However, this bias 
vanishes as N —> oo. Hence, the estimates yield consistent estimates. Because also 
the variance converges to zero, the estimates are consistent in mean square, see, for 
example, [9.30]. 

9.1.2 Convolution 

If E {u(k)} = 0 and E {v(k)} = 0 the correlation functions for the input and output 
signals of a linear process are related by the convolution sum 

OO 

^uy(r) = ^2 g( v )®uu(r - V ) ( 9 . 6 ) 

u = 0 

If / + 1 values of g(v) have to be determined and the convolution sum is truncated 
for v > I, then 

<M*) « < H g (9.7) 

where 

*l u = —!)••• 4>«h(t - /)] 

g r = L?(0) g(i)...g(/)] 

Now / + 1 equations are required to get a unique solution. Therefore r is varied 
within 

-P <t <-P + 21 
and the equation system becomes 


$uy « ®uu ■ g (9.8) 

As <b uu is a (/ + 1) x (/ + 1) square matrix the impulse response estimates result 
from the deconvolution equation 


*au ®uy 


where, of course, can be inverted only if 


(9.9) 


det$„„ ± 0 


(9.10) 
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which is an identifiability condition. In other words, the process must be persistently 
excited (of order / + 1). 

If the input signal u(k ) is white noise with ACF 

$««(» = & (r) = $ M (0)i(r) 


if follows from (9.6) that 


£0) = 


1 

4W0) 




The impulse response is then proportional to the CCF. 

If stochastic, stationary noise n(k) acts on the process output, the necessary con¬ 
ditions for the consistent estimation of g(x) in mean square are the following: 


• n(k) and y(k ) are stationary; 

• E {u(k)} = 0; 

• u(k) is persistently exciting; 

• n(k) is not correlated with u(k). 


For more details see [9.12], [9.30]. 


9.2 Parameter estimation for linear processes 

It is assumed that the process can be described by the linear difference equation 

y u (k ) + a i y u (k 1) + ... + a m y u (k - m ) 

= b\ u(k — d — 1) + ... + b m u (k — d — m) 


Here, 

u(k) = U(k)~ U QQ 
}’u(k) = Y u (k) - F 00 

are the deviations of the absolute signals U(k) and Y u (k ) from the operating point 
described by Uqq and Too, k is the discrete time k = t/To = 0 , 1 , 2 ,... , 7o is the 
sampling time and d = T t / To = 0, 1,2,... is the discrete dead-time of the process. 
The corresponding transfer function in the z-domain is 


G P (z) = 


}’u{=) = B(Z __ d 

u(z) 

b\ z~ l + ... + b m z~ m ~_ d 
1 + fli z _1 + ... + a m z~' n “ 


The measured signal contains a stationary, stochastic disturbance 


y(k) = y u (k) + n (k) with E {n(k)} = 0 


(9.13) 


(9.14) 


The task is to determine the unknown parameters a, and bj from N measured input 
and output signal data points, see Figure 9.2. 
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9.2.1 Method of least squares (LS) 
a) Equation error methods 

Let the model parameters obtained from the data up to the sample (k — 1) be denoted 
by d\ and Then, (9.11) becomes in the presence of a disturbed output signal 

y(k) + a\ y(k - 1). + a m y(k - m) : 

— b\ u(k — d — 1) — ... — b m u(k — d — m) = e(k) 

where the equation error (residual) e(t) is introduced instead of “0”. This error cor¬ 
responds to a generalized error, see Figure 9.2. This can be seen by rewriting (9.15), 
compare Figure 9.2a. 

A (z~ 1 )y(z) — B(z~ l ) z~ d u(z) = e (z) (9.16) 

e is linearly dependent on the parameters sought for (linear in the parameters). 



Fig. 9.2. Model structures for parameter estimation: (a) equation error; (b) output error 


From (9.15), y(k\k — 1) can be interpreted as the one-step-ahead prediction, 
based on the measurements up to sample (k — 1) 

y{k\k- 1) = f T (k)0 (9.17) 


with the data vector 


ffr ( k ) = [—y (k — 1)... — y ( k — m)\ u(k — d — 1)... 

u(k — d — m)] 

and the parameter vector 

0 = [di...a m | b x . ..b m ] T 

Consequently, (9.15) can be written as 

y(k) = f T (k)0 + e(k) 


(9.18) 


(9.19) 


(9.20) 
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The measured signals for k = m + d ,..., m + d + N are written in vectors, e.g. 


y T (m + d + N) = [y(m + d)... y(m + d + AT)] 


(9.21) 


Then, 

y(m + d + n) = (m + d + N)0 + e(m + d + N) (9.22) 

where 'll is a ((,V + 1) x 2m)-data matrix. Minimizing the sum of errors squared 
m+d+N 

V = ^2 el (&) = e r (m + d + N) e (m + d + N) (9.23) 

fc=m+d 


yields 

= -2«» r [y-^^] = 0 (9.24) 

0=6 

for the unknown parameters. From this, the (nonrecursive) estimation equation of the 
least squares (LS) method can be obtained 


dV 

~de 


0 = [9 T y]- l ¥ T y 


The matrix 


P = [«fr r ijr]- 1 

has the dimension (2m, 2m). The inverse exists if and only if 

det [^ r ty] = det P” 1 ^ 0 

Also, 


3 2 V 


= * T 


(9.25) 

(9.26) 

(9.27) 

(9.28) 


3 0 3 0 T 

has to be positive-definite such that the loss function V has a minimum. Both re¬ 
quirements are satisfied if and only if 


det ^] = det P 1 > 0 


(9.29) 


This condition also includes that the input signal is persistently exciting the process 
and that the process is stable. 

From parameter estimation methods, it is usually required that the estimate is not 
biased for a finite number of data samples N 

E{0(N)} = 6 o (9.30) 

(0o denotes the true parameters) and is consistent in the quadratic mean 

lim E 0 (N) = 0 O (9.31) 

N-+oo 

lim E[0 (N)-0 o ][O (N)-0 o ] T = 0 

N-^-oo 


(9.32) 
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For the least squares method, (9.31) becomes, by substituting (9.22) into (9.25) 


E (AT)J = 0 O + E j[* r 
= 0 o + b 


(9.33) 


In order to have a vanishing bias (systematic estimation error) b, and e must 
be uncorrelated. Consequently, e(k) must not be correlated and E { e ( k ) j = 0. The 
estimation is unbiased if the disturbance signal n(k) is generated by the disturbance 


filter 


G v {z) 


n(r) 

n(z) 


1 

Mz~') 


(9.34) 


where v(k) is discrete white noise, see Figure 9.3. Since this filter does not exist in 
practice, the least squares estimation, in general, yields biased estimates. These sys¬ 
tematic estimation errors are the larger the greater the variance o 2 of the disturbance 
signal is compared to the output signal o 2 . 



Fig. 9.3. Model configuration for the least squares method with equation error (generalized 
error) 


For the covariance matrix, the following is true if 0 = 0 o (which means e = 0) 
cov[A0] = £j[0-0o] [£-0o] r J o 2 e E {P} 

= CT e 2 £'{[—-- V T 1 —-— 

\lN+ 1 J J N+ 1 

= rr 2 {$- 1 (7V+l)J^ T (9.35) 
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rig is the variance of e(k). 'I' is a matrix whose elements are correlation functions. 
For N — > oo, (9.32) is satisfied. E {P} is proportional to the covariance matrix of 
the parameter estimation errors. 

Because of the biased estimates for the least squares algorithm, this method can 
only be used for processes with no or only small disturbance signals. A big advan¬ 
tage of the least squares algorithm, however, is that the parameter vector 0 can be 
determined in one batch calculation and no iterative methods are necessary. This is 
possible since the employed error measure is linear in the parameters. 

b) Output error methods 

Instead of the equation error the output error 

e\k) = y(k) - y M (d, k^j (9.36) 

can be used, where 

}’m (#.-) = ^ m(-) (9-37) 

is the model equation output, see Figure 9.2b. But then no direct calculation of the pa¬ 
rameter estimates 0 is possible because e'(k) is nonlinear in the parameters. There¬ 
fore the loss function (9.23) is minimized by a numerical optimization method, e.g. 
downhill-simplex. The computational effort is then larger, and on-line real-time ap¬ 
plication, in general, not possible. However, relative precise parameter estimates may 
be obtained, [9.10], [9.50], 


c) Recursive least squares (RLS) methods 

Writing the nonrecursive estimation equations for 0 (k + 1) and 0 (k ) and subtracting 
one from the other, results in the recursive parameter estimation algorithm 


0(k + 1) = 0(k) + y(k) [y(k + 1) 

new old correction new 

estimate estimate factor measurement 


f T (k + 1)0(A:)] 

one - step - ahead 
prediction of the new 
measurement 


(9.38) 


The correcting vector is given by 

y(k) = P (k + 1) f(k + 1) 

= —f ---P (k) f(k + 1) 

f T (k + 1)P (k) ir(k + 1)+ 1 


(9.39) 


and 


P (k + 1) = [I — y(k) f T (k + 1)] P (k) 


(9.40) 
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To start the recursive algorithm one sets 


0(0) = 0 

P(0) = al 

(9.41) 

with a large (a = 100__ 1000). The expectation of the matrix P is 

to the covariance matrix of the parameter estimates 

proportional 

£{P(A- + 1)}= cov [A0 (k)\ 

<>e 

(9.42) 

with 


a; = E |e r e[ 

(9.43) 

and the parameter error 


o 

■as 

1 

II 

as 

<1 

(9.44) 


Hence, the recursive algorithm contains the variances of the parameter estimates 
(diagonal elements of covariance matrix). (9.38) can also be written as 


0(k + 1) = 6(k) + y(k) e(k + 1) (9.45) 

To improve the numerical properties of the basic RLS algorithms, modified versions 
are recommended, see Section 9.2.3. 

DC value estimation 

As for process parameter estimation the variations of u(k) and y(k) of the measured 
signals U(k) and Y(k) have to be used. The DC (direct current or steady-state) values 
(Zoo and Too either have also to be estimated or have to be removed. The following 
methods are available. 

Differencing 

The easiest way to to obtain the variations without knowing the DC values is just to 
take the differences 

U(k) — U(k — 1) = u(k) — u{k — 1) = A u(k) 

Y(k ) - Y(k - 1) = y(k) - y(k - 1) = A y(k) (9 ' 4&) 

Instead of u(z) and y(z), the signals A u(z) = w(z)(l — z _1 ) and A y(z) = 
y(z)( 1 — z 1 ) are then used for the parameter estimation. As this special high-pass 
filtering is applied to both the process input and output, the process parameters can 
be estimated in the same way as in the case of measuring u(k) and y(k). In the 
parameter estimation algorithms u(k) and y(k) have to be replaced by Au(k) and 
Av(k). However, if the DC values are required explicitly, other methods habe to be 
used. 
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Averaging 

The DC values can be estimated simply by averaging: 

1 M 

Y oo = ^ £ W ( 9 - 47 > 

k= 1 

The recursive version of this is 

f 00 = Y 00 (k « 1) + X - [ Y(k ) - f 00 (/c - 1)] (9.48) 

For slowly time-varying DC values, recursive averaging with exponential forgetting 
leads to 

Too = Too (/c - 1) + (I - A) Y(k) (9.49) 

with X < 1. A similar argument applies for Uqo . The variations u(k ) and y(k) can 
be determined by (9.12). 


Implicit estimation of a constant 

The estimation of the DC values Uqq and Too can also be included into the parameter 
estimation. Substituting (9.12) into (9.11) results in 


Y(k) = —a \Y(k — 1) 


■ a m Y(k — m ) + b\U(k — d — 1) 
-|- b m U{k — d — //:) + C 


(1 + a i + ... + a m ) Tqi 


+ b 

m ) Uoo 


Extending the parameter vector 6 by including the element C and the data vector 
\jr T (k) by adding the number 1, the measured T(Ar) and U(k) can be directly used 
for the estimation and C also can be estimated. Then, for one given DC value the 
other can be calculated, using (9.51). For closed-loop identification it is convenient 


Too = W(k) (9.52) 

Explicit estimation of a constant 

The parameters di and bj for the dynamic behavior and the DC constant C can also 
be estimated separately. First the dynamic parameters are estimated using the differ¬ 
encing method above. Then with 

L(k) = Y(k) + diY(k - 1) + ... + a m Y(k - m) 

-by U(k - d - 1) - ... - b m U(k -d- m) ( ' 


the equation becomes 


e(k) = L(k) - C 


and after applying the LS method 
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j m+d+N 

C(m + d + N) = — -- L(A ') (9.55) 

k=m+d 

For large N one obtains 

m 

Uoo (9.56) 

, 7=1 . 

If the Foo is of interest and (Zoo is known, it can be calculated from (9.56) using the 
estimate C. 

In this case 0 and C are only coupled in one direction, as 0 does not depend on 
C. A disadvantage can be the worse noise-to-signal ratio through differencing. 

The final selection of the DC method depends on the particular application. 

9.2.2 Extended least squares (ELS) method 

If instead of the LS method 

A(z~')y(z) — B(z~ l ) z~ d u(z ) = e(z) (9.57) 

with an correlated error signal e(z) the ARMAX model 

A (z _l ) y(z) — B(z~ x ) z~ d u(z) = Z>(z^ — 1)) e(z ) (9.58) 

with a correlated signal e(z) = D(z~ ] ) e(z) is used, the recursive methods for 
dynamical processes and for stochastic signals can be combined to form an extended 
least squares method, [9.61], [9.47]. Based on 

y(k) = f T (k) 0(k — 1) + e{k ) (9.59) 

the following extended vectors are introduced: 

i/r T (k) = [—y(k — 1)... — y(k — m) u(k — d — 1)... 

u(k — d — m) v(k — 1 ) ...v(k — p)\ (9.60) 

~t - 

0 =[ai...a m b\...b m d\...d m ] (9.61) 

The recursive version is especially suited to parameter estimation. The parameters 
are then obtained by 

0(k + 1) = 0(k) + y(k) [y(k + 1) - V(k + 1)<9(A)] (9.62) 

and equations corresponding to (9.39) - (9.40). The signal values v(k) = e(k) in 
-r 

\jr (k + I) are calculated recursively. Therefore the roots of D(z) = 0 must lie 
within the unit circle of the z-plane. The parameter estimation is unbiased and con¬ 
sistent in mean square if the convergence conditions of the LS method are transferred 
to the model equation (9.59). That means that (9.58) has to be valid. In addition, 
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H(z)=\/[D(z)\-l/2 (9.63) 

must be positive real. Besides this ELS method several methods exist, for example, 
those of instrumental variables or maximum likelihood, see, e.g. [9.12], [9.37], [9.62] 
and [9.30], 

9.2.3 Modifications of basic recursive estimators 

To improve some of the properties of basic parameter estimation methods, the cor¬ 
responding algorithms can be modified. These modifications serve to improve the 
numerical properties in digital computers, to give access to intermediate results and 
to diminish the influence of starting values. The numerical properties are important 
if the word length is relatively small, as with 8-bit or 16-bit microcomputers, or if 
the changes of input signals becomes small, as in adaptive control or fault detection. 
In both cases ill-conditioned equation systems result. 

The numerical conditions can now be improved by not calculating P as interme¬ 
diate value, in which the squares of signals appear, but square roots of P. This leads 
to square-root filtering methods or factorization methods, see, for example, [9.6]. By 
this means, forms can be distinguished which start from the covariance matrix P or 
the information matrix P -1 , [9.33], [9.6], [9.34]. The following treatment leans on 
[9.31], 

a) Factorization methods for P 

For discrete square-root filtering in covariance form (DSFC) the symmetric matrix 
P is decomposed in two triangular matrices 

P = SS r (9.64) 

where S is called the square root or the Cholesky factor of P. For the RLS method 
the resulting algorithm then becomes: 

0{k + 1) = 0(k) + y{k)e(k + 1) 
y(k) = a(k)S(k)t(k) 
f(/c) = S r (k)f(k + 1) 

S (k + 1) = [S(k) - g(k)y(k)f T (k)\/y/m 
1 /[a (k)\ = f T {k)i(k) + \{k) 

g{k ) = 1/[1 + s/Hk)a(k)\ (9.65) 

The starting values are S(0) = I and 0 (0) = 0. A disadvantage is the calculation 
of the square roots for each recursion. 

Another method has been proposed by [9.6], the so-called U-D factorization 
(DUDC). Here, the covariance matrix is factorized by 

P = UDU r 


(9.66) 
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where D is diagonal and U is an upper triangular matrix with ones in the diagonal. 
Then the recursions for the covariance matrix are 

U (k + 1) D(k + 1) U T (k + 1) 

= [U(/c) D(k) U T {k) - y{k) f T (k + 1) U(k) D (k) U r (A:)] (9.67) 

A 

After substitution of (9.39) and (9.99) the right-hand sides becomes 

UDU r = ju(ife) D(/c)--—-v(Ar) F(k) D(/c) U T (k) 

A L a(k) 

= }u(fc) [d (At) - JLvW V r (fc)l U T (k) (9.68) 

A L ot(k) 

where 

f(/c) = U T (k) f(k + 1) 
v(/c) = D(/t ) f(/c ) 

a(k) = X + f T (k) y(k) (9.69) 


The correcting vector then yields 

y(k) = ’ U (k)y(k) (9.70) 

a(k) 

If the term (D — a~ 1 \\ T ) in (9.68) is again factorized the recursion for the elements 
U, P and X become 

aj = aj -1 + v f fj 

dj(k + 1 ) = dj (k)a(j — l) / (a j — X) 

bj = vj 

Vj = fj/0Cj-l 

[9.6] with the initial values 


j=2,...,2m (9.71) 


«i 

bi 


X + v\ f\\ 

Vl 


d\(k + 1 ) — 


di(jc) 

a\X 


(9.72) 



(9.74) 
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§(k + 1) = 0(k) + y(k) e(k + 1) 
e(k + 1) = y(k + 1) - f T {k + 1) 0(k) 

(9.74), (9.71) and (9.73) are calculated instead of (9.39) and (9.40). As compared to 
DSFC, here no routines are required for square-root calculations. The computational 
expense is comparable to that of RLS. The numerical properties are similar to those 
of DSFC, only the matrix elements of U and D may become larger than those of S. 

To reduce the calculations after each sampling, invariance properties of the ma¬ 
trices, [9.36], may be used to generate fast algorithms. A saving of calculation time 
only results for order m > 5, but at the cost of greater storage requirements and 
higher sensitivity for starting values. 


b) Factorization methods for P 11 


Discrete square-root filtering in information form (DSFI) results from the nonrecur- 
sive LS method of the form 

P ~\k + 1) 0(k + 1) = V T (k + 1) y(k + 1) = f (k + 1) (9.76) 


with 


P~\k+ 1) 


AP l (k) + r/r(k + l)f J (k + 1) 


f(k + 1) = X((k) + f{k + l)y(k + 1) 

The information matrix P 1 is now split into upper triangular matrices R: 


(9.77) 




R r R 


(9.78) 


Note that R = S 1 , cf. (9.64). Then 0(k + 1) is calculated from (9.76) by back- 
substitution from 

R(Ar + 1) 0(k + 1) = b(k + 1) (9.79) 

This equation follows from (9.76) introducing an orthonormal transformation matrix 
Q (with Q 7 Q = I) such that 

^ t Q t Q^ 0 = ^ T Q T Qy (9.80) 


Here 



possesses an upper triangular from, and the equation 


Qy = 


b 

w 


(9.81) 


(9.82) 


holds. With equation (9.80) it follows that 
1 compiled by Michael Vogt 
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Q (k + 1)9 (k + 1 )0(k + 1) = Q (k + l)y (k + 1) (9.83) 

Actually, DSFI uses a different idea to minimize the sum of errors squared 

F = £V(Ar) = lle||| = ||*tf-y|ll (9.84) 

Whereas the LS method solves the normal equations VF = 0, here the QR fac¬ 
torization Q9 = (*) is used to simplify (9.84). This relies on the fact that the 
multiplication with an orthonormal matrix Q does not change the norm of a vector: 

w||| = min (9.85) 


F 


1 90 


ylll 


HQ 90 


Qylll 


-(V-V)| 


l = || RS — b||l + 


As already stated in (9.79), the parameters 0 are determined by solving the system 
R0 — b = 0, whereas |w||; is the remaining residual, i.e. the sum of errors squared 
for the optimal parameters 6. The advantage of this orthogonalization method can 
be seen from the error sensitivity of the system that determines the parameters, see 
[9.15]. If the normal equations (9.76) are directly solved by the LS method, the pa¬ 
rameter error is bounded by 


< cond(P“ 1 ) = cond 2 (\l>) , (9.86) 

II 0 || llyll llyll 

where cond(-) is the condition number measuring the error sensitivity of the solution 
with respect to the error Ay in the process output signal. A similar bound can also 
be found for the input signal. However, if the orthogonalization approach is used, the 
upper bound for the parameter errors is given by 


|| Ad || || Ab|| || Ay || 

< cond(R) --- = cond(^) ^ , (9.87) 

||d|| “ l|b|| llyll 

i.e. the system (9.79) is much less sensitive to measurement errors then the normal 
equations (9.76) themselves. 

The main effort of the method described above is the computation of R and b. 
This is usually done by applying Householder transformations to the matrix (ML y), 
see [9.15], so that Q does not need to be computed. However, DSFI computes R and 
b recursively. Assuming that in each step one row is appended to (, y), (9.83) is 
now transferred to a recursive form, [9.33], 


'R(fr+ 1)' 

o r 

= Q (k + 1) 

b (k + 1) 
w(k + 1) 

= Q(k + 1) 


VI R (k) 

it T (k + 1) 

VI b(/c ) 

y(k + 1) 


(9.88) 

(9.89) 
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Then R(Ar + 1) and b(Ar + 1) are used to calculate 0(k + 1) with (9.79), whereas 
w(k + 1) is the current residual. This form is partially nonrecursive and partially 
recursive and has the advantage that no starting values have to be assumed and that 
R(0) = 0. The method is especially suitable if the parameters are not required for 
each sampling time. Then only R and b have to be calculated recursively. This is 
done by applying Givens rotations to the right hand sides of (9.88) and (9.89). The 
Givens rotation 


G = 


y o' 

—a y 


(9.90) 


is applied to a 2 x /i matrix M in order to eliminate the element m' 2l in the trans¬ 
formed matrix \T = GM, i.e. to introduce a zero in the matrix 


y o 

mu m 12 • • • 


" W ll m 'l2 • 


—o y 

m 21 m 22 • • • 


0 m’ 22 ■ 



The two conditions 


det(G) = y 2 + a 2 = 1 
m' 21 = —am n + ym 2 i = 0 

yield the rotation parameters 

m ii 


V = 


m 2 n + m\ 1 


and 


(normalization) 
(elimination of m' 2l ) 



m 2i 

u + m 


2 

21 


(9.92) 

(9.93) 


This transformation is now sequentially applied to tjr T (k + 1) and the rows of VXR 
in (9.88), where G is now interpreted as a (n + 1) x (« + 1) matrix 


* * * 

0 * * 

G! 

1 

• * 

• * 

• O 

1_ 

g 2 

• • • 

0 • • 

g 3 

• • • 

0 • • 

0 0 * 

> 

0 0 * 

> 

0 0 * 


0 0. 

* * * 


0 * * 


0 0 * 


0 0 0 


The product of the Givens matrices is the transformation matrix Q(k + 1): 


R (k + 1) 

0 r 


— G„ (k + 1)... G i (k + 1) 

q(T+T) 


VIR (k) 
f T {k + 1)J 


(9.94) 


that produces R(/c + 1). The same method is used to compute b(Ar + 1). A complete 
DSFI update step can be described as follows: 
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Compute 

for i = 

1, ... ,n: 

m(k 

+ 

0 = 

^krk(k) + (f\ l) {k + l)) 2 



y = 

r a(k)/I'aik + 1) 



a = 

f\ l) (k + 1 )/ru(k + 1) 

>'ij(k 

+ 

i) = 

Vky t'i j (k) + crfj\k + 1) 


+ 

i) = 

-Vkarrij{k) + yfj\k + 1) 

bi(k 

+ 

i) = 

Vkybj(k) + <ry (l) (k + 1) 

yV+l)(k 

+ 

i) = 

-Vkorbj{k) + y> ,( ')(/c + 1) 


(9.95) 

j = i + 1_, n 


No essential differences in the numerical properties can be observed for DSFC 
and DSFI. Therefore, also DSFI requires the computation of n square-roots in each 
step. There are also factorizations for P 1 that do not require square-roots, just like 
the U-D factorization for P. These techniques replace the Givens rotations by fast 
Givens rotations, see [9.15], or employ recursive forms of the Gram Schmidt orthog- 
onalization. These fast orthogonalization methods show the same error sensitivity, 
but their matrix elements may become larger than those of DSFI. 

Further discussion of square-root filtering may be found in [9.49], [9.16] and 
[9.55], 


9.2.4 Parameter estimation of time-varying processes 

For many processes the parameters of the models are not constant. They change be¬ 
cause of internal or external influence with time. Frequently the case arises that the 
dynamic behavior is linearized around the operating point for small signal changes. 
After changes of the operating point the real nonlinear behavior becomes effective. 
If the nonlinear behavior is not very strong and the operating point changes slowly, 
useful results can be obtained with linear difference equations and time-varying pa¬ 
rameters. 

The following treatment is for RLS. 

a) Exponential weighting with constant forgetting 

Up to now it has been assumed that the process parameters to be estimated are con¬ 
stant and therefore the measured signals u(k) and y(k) and the equation error e(k) 
are weighted equally over the measuring time k = 0,..., N. If the recursive esti¬ 
mation algorithms are to be able to follow slowly time-varying process parameters, 
more recent measurements must be weighted more strongly than old measurements. 
Therefore the estimation algorithms should have a fading memory. This can be incor¬ 
porated into the least-squares method by time-depending weighting of the squared 
errors, the method of weighted least squares, see, e.g. [9.29]: 
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m+d+N 

V(k) = w (i) ^ 2 (0 (9.96) 

i=m+d 

By choice of 

w(k) = x (m+d +N)-k = x N'-k^ o < 1 < 1 (9.97) 

the errors e(k) are weighted as shown in Table 9.2 for N' = 50. The weighting then 
increases exponentially to 1 for N'. 


Table 9.2. Weighting factors due to (9.97) for N' = 50 


k 

1 

10 

20 

30 

40 

47 

48 

49 

50 

A = 0.99 

0.61 

0.67 

0.73 

0.82 

0.90 

0.97 

0.98 

0.99 

1“ 

A = 0.95 

0.08 

0.13 

0.21 

0.35 

0.60 

0.85 

0.90 

0.95 

1 


This leads to an exponential forgetting memory. The recursive estimation algo¬ 
rithms are then 


0(k+ 1) 

y(k) 

P (k + l) 


6(k) + y(k) | y(k + 1) 
1 


f (k + 1) P(/c) f(k + 1) + X 


(k+ l)tf(*)J 

P(/c) f(k + 1) 


[i-y(k) 


(k + 1)| P(l)[ 


(9.98) 

(9.99) 

(9.100) 


The influence of the forgetting factor X can be recognized directly from the inverse 
of the covariance matrix 


P ~\k + 1) = A P _1 (A) + f{k + 1) f T (k + 1) (9.101) 

P 1 is proportional to the information matrix J given by 

J= -1 E J* 7 '*] = -L E {P" 1 } (9.102) 

see, [9.12], [9.29]. 

By taking A < 1, the information on the last step is diminished or the covariances 
are increased. This means worse estimates are obtained, such that the new measure¬ 
ments get more weight. 

For A = 1 we have 


lim E {P (k)} = 0 

k—>o o 

lim E {y{k)} = lim E {P(k + 1) T/r(k + 1)} 
k^-oo k—>o o 

For large k the measurements then have practically no influence 
the elements P -1 (k + 1) tend to infinity, (9.101). 


(9.103) 
= 0 (9.104) 

on 0(k + 1). Then 
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If, however, A < 1 then, from (9.101) 

k 

P ~ x {k) = X k p-'(0) + +(!) f T (0 (9.i05) 

i =0 

For large values a of the starting matrix P(0) = ul the first term vanishes. As for 
A < 1 

k k— 1 

lim V"' X k ~‘ = lim V"' X' < oo (9.106) 

k->oo ^' A:—>oo ^' 

1 = 1 /=0 

(convergent series with positive elements) P 1 (k) converges to fixed values 


lim E jp -1 (A:)j = P _1 (oo) 

(9.107) 

and does not approach infinity. Hence, 


lim E {P(Ar)} = P(oo) 

k—>oo 

(9.108) 

and 


lim E\y(k)} = y(oo) 
k—>oo 

(9.109) 


are finite and nonzero. Therefore the new measurements get a constant weight for 
large k and the estimator remains sensible for parameter changes. Because of the 
smaller effective averaging time the noise influence increases and so do the variances. 
Examples are shown in [9.30]. 

The forgetting factor X has to be selected as follows: 

• X small, if the speed of parameter changes is large (say X = 0.90). Then only 
small noise is allowed; 

• X large, if the speed of parameter changes is small (say X = 0.98). Then the noise 
can be larger. 

As the RML and RELS methods converge more slowly during the starting phase, the 
convergence can be accelerated by smaller weights at the beginning. 

Parameter estimation algorithms with constant forgetting factor are suited for 
processes with small parameter changes and persistent input excitation. Also if the 
process parameters are constant, good results are obtainable if the noise with regard 
to the memory length M = 1/(1— A) is not too large. However, problems may arise 
if with constant forgetting factor A < 1 the input is not sufficiently exciting. Then 
the values P -1 (k + 1) decrease because f(k +!)%(), see (9.101), or the elements 
of P(A: + 1) increase continuously (covariance matrix blows up). As the correcting 
vector is 

y(k) = ¥{k + 1) f(k + 1) (9.110) 

the estimator becomes more and more sensitive. Then a small disturbance or a nu¬ 
merical error may suffice to generate sudden hight parameter estimate changes. The 
estimator then becomes unstable. This situation can be observed with adaptive con¬ 
trol systems. Therefore, the input excitation has to be monitored or the forgetting 
factor has to be time-variant. 
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9.2.5 Parameter estimation for continuous-time signals 

Parameter estimation methods for dynamic processes were first developed for process 
models in discrete time in combination with digital control systems. For some ap¬ 
plications, e.g. the validation of theoretical models or for fault diagnosis, however, 
parameter estimation methods for models with continuous-time signals are needed. 


Method of least squares 


A stable process with lumped parameters is considered, which can be described by 
the linear, time-invariant differential equation 

a n yu\t) + a n -1 yi n ~ l) (t) + ... + a\ >4° (0 + J«(0 (9 1111 

= b m u^ m \t) + b m -\ u^ m ~ 1 ) (t) + ... + b\ u^(t) + b 0 u(t ) m < n 

It is assumed that the derivatives of the output signal 

y^Ht) = d J y(t)/dt J , j = \,2,...,n (9.112) 


and of the input signal for j = 1 , 2 ,..., m exist. u(t ) and y(t ) are the deviations 


u(t) = U(t) — Uoo 
y(t) = no - Foo 


(9.113) 


of the absolute signals U(t) and Y(t ) from the operating point described by Uoo and 
Foo- The transfer function corresponding to (9.111) is 


Gp (0 


= B(s) 
u(s) A(s ) 

bo b\ s + ... + b m —\ s m * + b t 


1 + a\ s + ... + a „-1 s n 1 + a n s n 

The measurable signal y(0 contains an additional disturbance signal n(t) 


(9.114) 


y(t) = y u (t) + n(t) (9.115) 

Substituting (9.115) into (9.111) and introducing an equation error e(t) yields 

y{t) = f T (t) 0 + e(t) (9.116) 


with 


f T (t) = [—y (1) (0 ... - y {n \t) | u(t)... u {m \t) 


(9.117) 


0 =[a\ ...a n b 0 ...b m ] T (9.118) 

The input and output signals are measured at discrete time samples t = k To, k = 
0, 1,2,... , N with sampling time To and the derivatives are generated. Based on 
this, N -f- 1 equations can be written down 
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y(k) = f T (k) 6 + e(k) 

This system of equations can be written in matrix notation as 

y = f H e 


Minimizing the loss function 

N 

V = e T (N) e(/V) = e 2 ( k ) 

k=0 


(9.119) 


(9.120) 


(9.121) 


yields with dV /d 0 = 0 as previously shown in Section 9.2.1 the vector of parameter 
estimates for the least squares method 


0(AT) = [> r «i»] * V T y (9.122) 

The existence of a unique solution requires that the matrix is positive-definite. 
After dividing this matrix by the measurement time, the elements of the resulting 
matrix are the estimates of the correlation functions 3>(r) of the derivatives of the 
signals for r = 0 with no time shift. It can be seen that the form is very similar 
to the least squares method for models with discrete time signals. Hence, a lot of 
the derivations can be directly transferred, such as the recursive formulation and the 
numerically improved versions. However, particular problems arise concerning the 
convergence and the evaluation of the needed derivatives of the signals. 

A convergence analysis shows that the estimates for continuous signals are also 
biased if the error signal e(k) is statistically independent. Hence, the estimates in 
general are biased for disturbed processes. 

If the needed derivatives of the signals are directly measurable (e.g. as for vehicle 
applications), these values can be written in the data matrix and the correlation 
functions in the matrix + 1) can be directly calculated. However, if 

the derivatives are not measurable, the derivatives have to be evaluated from the 
sampled signals u(t) and y(t). For this, there basically exist the following methods. 
The numerical differentiation in combination with interpolation approaches (splines, 
Newton’s method) is usually not able to suppress noise due to disturbance signals. 
State variable filters (SVF), see Figure 9.4, 


m = 


yj_ (■?) 

v(s) 


l 

fo + fl S + ... + fn-l S n ~ l + s n 


(9.123) 


have proven to yield good results. The state variable filter is a low-pass filter that 
provides the derivatives as well as filters the disturbance signals. With the state vari¬ 
able filter, the input signal u(t) and the output signal y(t ) is filtered. The choice of 
the filter parameters f is relatively free. The design of a Butterworth filter is recom¬ 
mended, see [9.48]. A further possibility is the application of finite impulse response 
filters (FIR), where the derivations of the impulse response of a low-pass filter are 
convoluted with the signal, [9.46], [9.59], [9.60]. 
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Fig. 9.4. State variable filter 


For large signal-to-disturbance ratios, this least squares method has been shown 
to yield good results. For larger disturbance signals, consistent parameter estimation 
methods should be employed such as the instrumental variables method. 

Since these parameter estimation methods are based on discrete time estimation 
methods, a lot of results and estimation methods for discrete time models can be 
transferred to models for continuous time. 

9.2.6 Parameter estimation in closed loop 

For the parameter estimation in closed loop special conditions have to be satisfied. 
The problem is obvious if correlation analysis is considered. For the convergence 
of the cross-correlation function between input u(t) and output v(t). the input u(t) 
must be uncorrelated with the noise n(k), see Figure 9.5. As feedback generates such 
a correlation, correlation methods cannot be applied directly if no external excitation 
signals are introduced. In the case of parameter estimation, the situation changes 
as only the error signal e{k) must be uncorrelated with the elements of the data 
vector i/r T (k). A detailed treatment of closed loop identification if given in [9.31] 
and [9.17], Now two cases are considered. 



Fig. 9.5. Scheme for process identification in closed loop. u s : additional perturbation signal 
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a) Closed loop identification without external excitation signal 

If the closed loop is only driven by the noise n(k ) and the reference value does not 
change r(k) = 0, then direct LS estimation is based on the model 

y{k) = f T (k)® = [~y(k - 1) ... -y(k - m a ) u(k -d- 1) (9 

... u(k — d — nib)]® 

r/f T (k) is one row of the matrix ^ in (9.22). The feedback controller follows as 


u(k — d — 1) = —p\u(k — d — 2) — ... — p M u(k — /z — d — 1) 
— qoy(k — d — 1) — ... — q v y(k — v — d — 1) 


(9.125) 


u(k — d — 1) is therefore linearly dependent on the other elements of i/r T (k) if 
11 < nib ~ 1 and v < m a — d — 1. Only for /x > mj or v > m a — d this linear 
dependence vanishes. Therefore an identifiability condition is that the order of the 
controller must be large enough. If, for example, nib = 3 and m a = 3 with d = 0, 
the controller orders must be /r > 3 or v > 3, and with d = I is holds /x > 3 or 
v > 2. If this identifiability condition is satisfied, one-step-ahead prediction error 
parameter estimation method as LS or ELS can be applied directly on the measured 
signals u(k) and y(k). 

b) Closed loop identification with external perturbation 

If the closed loop is perturbed by a sufficiently exciting external input signal u s (k) 
or r{k). Figure 9.5, the process input u(k — d — 1), (9.125), is no longer linearly 
dependent on the elements of the data vector ifr T (k) and the process model can be 
directly identified by measuring u(k) and y(k) as in open loop. However, the orders 
of the process model must be known a priori. 


9.3 Identification of nonlinear processes 

Many processes show a nonlinear static and dynamic behavior, especially if wide ar¬ 
eas of operations are considered. Examples are vehicles, aircraft, combustion engines 
and thermal plants and all processes with Coulomb friction and magnetic hysteresis. 
Therefore, the identification of nonlinear processes is of increasing interest. 

Classical nonlinear models in combination with parameter estimation methods 
as well as model architectures originating from the field of artificial neural networks 
are well suited to the identification of nonlinear static and dynamic processes. The 
next sections present model architectures for the identification of processes with con¬ 
tinuously differentiable nonlinearities. The last section deals with the experimental 
modelling of non-continuously differentiable processes. 
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9.3.1 Parameter estimation for nonlinear static processes 


A process is considered for which the output signal depends nonlinearly from the 
input signal according to a polynomial 

Y u (k) = K 0 + U{k)K\ + U 2 (k)K 2 + ... + U q (k)K q (9.126) 

compare Figure 9.6. The output may be contaminated by disturbance n(k) 


Y(k) = Y u (k ) + n{k) 


Defining the vectors 


U(/f) = [1 U(k) U 2 (k)... U q (k)] 
K r = [*„ Ki K 2 ... K q \ 

leads with k = 0,... , N — 1 to the regression model 

Y=UK+n 


(9.127) 


(9.128) 


(9.129) 


Introducing the equation error 

e = Y — U K (9.130) 

where U and Y contain the measurements, shows that the equation error is linear in 
the parameters and the loss function 


V = e r e 

(9.131) 

Minimization with 

d V „ 

dK ~ 

(9.132) 

yields the least squares estimate 


*> 

II 

1-1 

d 

d 

i_i 

1 

G 

(9.133) 

Existence of this estimate requires 


det = [U r u] + 0 

(9.134) 


which means that the input signal U(k) must change during the measurements. If 
E { n(k )} = 0, the parameter estimates are consistent in mean square, [9.30]. 

For fault detection one directly compares the estimated parameters Ki or the 
nonlinear characteristic curves, e.g. for pumps. 
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Fig. 9.6. Parameter estimation (regression) for a nonlinear characteristic 


9.3.2 Parameter Estimation with Classical Nonlinear Models 

Classical methods for the identification of dynamic systems are mostly based on 
polynomial approximators. One distinguishes between general approaches, e.g. Vol- 
terra-series or Kolmogorov-Gabor polynomials, and approaches that involve special 
structure assumptions such as Hammerstein, Wiener or nonlinear difference equation 
(NDE) models, [9.12], [9.18], see also [9.30], [9.31]. 

Static polynomial approximators have the advantage of being linear in the para¬ 
meters. This advantage can be maintained for certain dynamic polynomial models. 
This way, computationally expensive iterative optimization methods can be avoided. 

In the following, the linear difference equation is written with the shift operator 
q~ l , where (q~' y(k) = y(k — i)) 

A(q~ x ) y(k) = B (, q -1 ) q~ d \ u(k ) + D ( q~ l ) v(k) (9.135) 

according to (9.58). 

The following examples of classical nonlinear dynamic models are based on the 
representation of the nonlinearity by polynomials. 


Generalized Hammerstein model 

A{q~ x ) y(k) = Bi(q~ x ) u{k ) 

+ B 2 (q- 1 ) u 2 (k ) + ... + D (q~ l ) v(k) 

Parametric Volterra model 

A{q~ l ) y(k ) = B x {q~ x ) u(k) 

+ B 2 (q~ x ) u(k)[q~ a u(k)\ 
+ D (q~ x ) v(k) 


Nonlinear Model, [9.35] 

Ai(q~ l ) y(k) + a 2 ^~ X ) y(k) [q~ a y{k)] 

= B {q~ x ) u(k) + D (q~ x ) v(k) 


(9.136) 


(9.137) 


(9.138) 


In the case of an equation error optimization, these models have the advantage of 
being linear in the parameters. Therefore, linear parameter estimation methods like 
LS, RLS and RELS can be applied directly, [9.35], [9.31]. 
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9.3.3 Artificial Neural Networks for Identification 

For a general identification approach, methods of interest are those that do not require 
specific knowledge of the process structure and hence are widely applicable. Artifi¬ 
cial neural networks fulfil these requirements. They are composed of mathematically 
formulated neurons. At first, these neurons were used to describe the behavior of 
biological neurons, [9.39]. The interconnection of neurons in networks allowed the 
description of relationships between input and output signals, [9.53], [9.58]. In the 
sequel, artificial neural networks (ANNs) are considered that map input signals u to 
output signals y, Figure 9.7. Usually, the adaptable parameters of neural networks are 
unknown. As a result, they have to be adapted or “trained” or “learned” by processing 
measured signals u and y, [9.21], [9.20]. This is a typical system identification prob¬ 
lem. If inputs and outputs are gathered into groups or clusters, a classification task in 
connection with, e.g. pattern recognition is given, [9.7]. In the following, the prob¬ 
lem of nonlinear system identification is considered (supervised learning). Thereby, 
the capability of ANNs to approximate nonlinear relationships to any desired degree 
of accuracy is utilized. Firstly, ANNs for describing static transfer behavior, [9.19], 
[9.51], will be investigated, which will then be extended to dynamic behavior, [9.2], 
[9.44], [9.32], 


Uj 


y i 

u 2 

? 

• 


Up 

y M 


Fig. 9.7. System with P inputs and M outputs, which has to be approximated by an artificial 
neural network 


a) Artificial neural networks for static systems 

Neural networks are universal approximators for static nonlinearities and are conse¬ 
quently an alternative to polynomial approaches. Their advantages are the need for 
only little a priori knowledge about the process structure and the uniform treatment 
of single-input and multi-input processes. In the following, it is assumed that a non¬ 
linear system with P inputs and M outputs has to be approximated, see Figure 9.7. 


Neuron model 

Figure 9.8 shows the block diagram of a neuron. In the input operator (synaptic 
function), a similarity measure between the input vector u and the (stored) weight 
vector w is formed, e.g. by the scalar product 


T 

w u 


E 

/=1 


Wj Uj 


w 


|u| cos</> 


(9.139) 
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or the Euclidean distance 


p 

X = I |u- w|| 2 = ^2 ( Ui-Wi ) 2 (9.140) 

1=1 



Fig. 9.8. General neuron model 


If w and u are similar, the resulting scalar quantity x will be large in the first case 
and small in the second case. The quantity x, also called the activation of the neuron, 
affects the activation function and consequently the output value 

y = y(x — c) (9.141) 

Figure 9.9 shows several examples of those in general nonlinear functions. The 
threshold c is a constant causing a parallel shift in the x -direction. 

Network structure 

The single neurons are interconnected to a network structure. Figure 9.10. Hence, 
one has to distinguish between different layers with parallel arranged neurons: the 
input layer, the first, second, ... hidden layer and the output layer. Generally, the in¬ 
put layer is used to scale the input signals and is not often counted as a separate layer. 
Then, the real network structure begins with the first hidden layer. Figure 9.10 shows 
the most important types of internal links between neurons: feedforward, backward. 
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(a) hyperbolic tangens (Tangens Hyperbolicus) 

(x-c) -(x-c) 

e - e _ 2 

y — (x-c) -(x-c) 1 - 2(x-c) 

e + e 1 + e 

(b) Sigmoidal function 

1 


(c) limiter 


(d) neutral zone 


; x - c > 1 
; \x - c\ < l 
; x - c < -1 


j o ; k - c| < l 
y= \ x-c-l; x-c >1 
I x-c+ 1; x -c <-1 

-(x-c) 2 

(e) Gauss-functions y = e 

(f) binary function 

( 0 ;x-c<0 
| l ;x-c>0 



Fig. 9.9. Examples of activation functions 



feedforward feedback recurrent 

links I links links 



input hidden hidden output 

layer layer layer layer 

Fig. 9.10. Network structure: layers and links in a neural network 


lateral and recurrent. With respect to their range of values, the input signals can be 
either binary, discrete or continuous. Binary and discrete signals are used especially 
for classification, while continuous signals are used for identification tasks. 
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Multi-layer perceptron (MLP) network 

The neurons of an MLP network are called perceptrons. Figure 9.11, and follow 
directly from the general neuron model, shown in Figure 9.8. Typically, the input 
operator is realized as a scalar product, while the activation functions are realized by 
sigmoidal or hyperbolic tangent functions. The latter ones are a multiple of differ¬ 
entiable functions yielding a neuron output with y = 0 in a wide range. Therefore, 
they have a global effect with extrapolation capability. The weights Wi are assigned 
to the input operator and lie in the signal flow before the activation function. 



Fig. 9.11. Perceptron neuron with weights Wj, summation of input signals (scalar product) and 
nonlinear activation function 


The perceptrons are connected in parallel and are arranged in consecutive layers 
to a feedforward MLP network. Figure 9.12. Each of the P inputs affects each per¬ 
ceptron in such a way that in a hidden layer with K perceptrons there exist (K ■ P) 
weights Wkp. The output neuron is most often a perceptron with a linear activation 
function, Figure 9.13. 

The adaptation of the weights u>i based on measured input and output signals is 
usually realized by the minimization of the quadratic loss function. 


JW = l 


N -1 

Y 


n= 1 


(9.142) 


e(n) = v(n) — y(n) 

where e is the model error, y is the measured output signal and y is the network 
output. 

As in the case of parameter estimation with the least squares method, 


dJ( w) 
d w 


(9.143) 


is generated. Due to the nonlinear dependency, a direct solution is not possible. 
Therefore, e.g. gradient methods for numerical optimization are applied. Because 
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layer <K> 


layer <L> 


layer <M> 



<K> <K> 

x y 


<L> v <L> 

x y 


Fig. 9.12. Feedforward multi-layer perceptron network (MLP network). Three layers with (2-3-1) 
perceptrons. < K > is the first hidden layer 



of the necessary back-propagation of errors through all hidden layers, the method 
is called “error back-propagation” or also “delta-rule”. The so-called learning rate 
t] has to be chosen (tested) suitably. In principle, gradient methods allow only slow 
convergence in the case of a large number of unknown parameters. 

Radial basis function (RBF) Network 

The neurons of RBF networks. Figure 9.14, compute the Euclidean distance in the 
input operator 

„ t =|| u - c || 2 (9.144) 

and feed it to the activation function 

Gm = Ym (||u - c II 2 ) (9.145) 

The activation function is given by radial basis functions usually in the form of 
Gaussian functions with 
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Ym = exp 



(u | — C m \ ) 2 , (« 2 -C m2 ) 


+ 


+ ... + 


0 U P - U m p) 2 


ml 


'm2 


mP 


(9.146) 


The centers Cj and the standard deviations aj are determined a priori so that the 
Gaussian functions are spread, e.g. uniformly in the input space. The activation func¬ 
tion determines the distances of each input signal to the center of the corresponding 
basis function. However, radial basis functions contribute to the model output only 
locally, namely in the vicinity of their centers. They possess less extrapolation ca¬ 
pability, since their output values tend to go to zero with a growing distance to their 
centers. 



Fig. 9.14. Neuron with radial basis function 


Usually, radial basis function networks consist of two layers, Figure 9.15. The 
outputs Yi are weighted and added up in a neuron of the perceptron type, Figure 
9.13, so that 

M 

y= Y, w ”> Gm (ll u-c|| 2 ) (9.147) 

m= 1 

Since the output layer weights are located behind the nonlinear activation functions 
in the signal flow, the error signal is linear in these parameters and, consequently, 
the least squares method in its explicit form can be applied. In comparison to MLP 
networks with gradient methods, a significantly faster convergence can be obtained. 
However, if the centers and standard deviations have to be optimized too, nonlinear 
numerical optimization methods are also required. 

Local linear model networks 

The local linear model network (LOLIMOT) is an extended radial basis function net¬ 
work, [9.42], [9.43]. It is extended by replacing the output layer weights with a linear 
function of the network inputs (9.148). Furthermore, the RBF network is normalized, 
such that the sum of all basis functions is one. Thus, each neuron represents a local 
linear model with its corresponding validity function, see Figure 9.16. The validity 
functions determine the regions of the input space where each neuron is active. The 
general architecture of local model networks is extensively discussed in [9.41], 

The kind of local model network discussed here utilizes normalized Gaussian 
validity functions (9.146) and an axis-orthogonal partitioning of the input space. 
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c n> *^11 c i p’ ®\p 




Fig. 9.16. Local linear model network (LOLIMOT) 
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Therefore, the validity functions can be composed of one-dimensional membership 
functions and the network can be interpreted as a Takagi-Sugeno fuzzy model. 

The output of the local linear model is calculated by 

M 

9 = ^2 ^*( u ) ( w *\o + ^, 1^1 + ... + w iiP u p ) (9.148) 

/=l 


with the normalized Gaussian validity functions 


$«( u) 


u ) 

Ef=, M/(u) 


(9.149) 


here 

Mi(u) = 14 ex P 1 ^ (9.150) 

The centers c and standard deviations a are nonlinear parameters, while the local 
model parameters u>,- are linear parameters. The local linear model tree (LOLIMOT) 
algorithm is applied for the training. It consists of an outer loop, in which the input 
space is decomposed by determining the parameters of the validity functions, and a 
nested inner loop in which the parameters of the local linear models are optimized 
by local-weighted least squares estimation. 

The input space is decomposed in an axis-orthogonal manner, yielding hyper¬ 
rectangles in whose centers the Gaussian validity functions /x,(m) are placed. The 
standard deviations of these Gaussians are chosen proportionally to the extension of 
hyper-rectangles to account for the varying granularity. Thus, the nonlinear parame¬ 
ters Cij and <t jj are determined by a heuristic-avoiding explicit nonlinear optimiza¬ 
tion. LOLIMOT starts with a single linear model that is valid for the whole input 
space. In each iteration, it splits one local linear model into two new sub-models. 
Only the (locally) worst performing local model is considered for further refinement. 
Splits along all input axes are compared and the best performing alternative is carried 
out, see Figure 9.17. 

The main advantages of this local model approach are the inherent structure 
identification and the very fast and robust training algorithm. The model structure 
is adapted to the complexity of the process. However, explicit application of time- 
consuming nonlinear optimization algorithms can be avoided. 

Another local linear model architecture, the so-called hinging hyperplane trees, 
is presented in [9.11], [9.56]. These models can be interpreted as an extension of the 
LOLIMOT networks with respect to the partitioning scheme. While the LOLIMOT 
algorithm is restricted to axis-orthogonal splits, the hinging hyperplane trees allow 
an axis-oblique decomposition of the input space. These more complex partitioning 
strategies lead to an increased effort in model construction. However, this feature 
is necessary in the case of strong nonlinear model behavior and higher-dimensional 
input spaces. 
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The fundamental structures of three artificial neural networks have been de¬ 
scribed. These models are very well suited to the approximation of measured in¬ 
put/output data of static processes, compare also [9.19], [9.51]. For this, the training 
data has to be chosen in such a way that the considered input space is as evenly as 
possible covered with data. After the training procedure, a parametric mathematical 
model of the static process behavior is available. Consequently, direct computation 
of the output values y for arbitrary input combinations u is possible. 

« 2 'U=- 

1. iteration 1 -1 


i u x 

tT % 






Fig. 9.17. Tree construction of the LOLIMOT algorithm 


An advantage of the automatic training procedure is the possibility of using arbi¬ 
trarily distributed data in the training data set. There is no necessity to know data at 
exactly defined positions, as in the case of grid-based look-up table models, see Sec¬ 
tion 9.3.4. This clearly decreases the effort required for measurements in practical 
applications. 
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Example 9.1: Artificial neural network for the static behavior of a combustion engine 

As an example, the engine characteristics of a six-cylinder SI (spark-ignition) engine 
is used. Here, the engine torque has to be identified and is dependent on the throttle 
angle and the engine speed. Figure 9.18 shows the 433 available data points that were 
measured on an engine test stand. 

For the approximation, an MLP network is applied. After the training, an approx¬ 
imation for the measurement data shown in Figure 9.19 is given. For that purpose, 31 
parameters are required. Obviously, the neural network possesses good interpolation 
and extrapolation capabilities. This also means that in areas with only few training 
data, the process behavior can be approximated quite well, [9.25]. 



Fig. 9.18. Measured SI engine data (2.5 I, V6 cyl.): unevenly distributed, 433 measurement 
data points 


□ 


b) Artificial neural networks for dynamic systems 

The memoryless static networks can be extended with dynamic elements to dynamic 
neural networks. One can distinguish between neural networks with external and 
internal dynamics, [9.44], [9.32]. ANNs with external dynamics are based on sta¬ 
tic networks, e.g. MLP or RBF networks. The discrete time input signals u(k) are 
passed to the network through additional filters Fj(q — 1). In the same way, either 
the measured output signals y(k) or the NN outputs y(k) are passed to the network 
through filters Gj(q — 1). The operator q~ 1 denotes a time shift 

y(k) ■ q~ l = y(k - 1) 


(9.151) 
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Fig. 9.19. Approximation of measured engine data (+) with an MLP network (2 • 6 • 1): 31 
parameters 


In the simplest case, the filters are pure time delays, Figure 9.20a 

y(k) = f NN [u(k ), u(k - 1),..., y(k - 1), y(k - 2),...] (9.152) 

where the time-shifted sampled values are the network input signals. The structure 
in Figure 9.20a shows a parallel model (equivalent to the output error model for pa¬ 
rameter estimation of linear models). In Figure 9.20b, the measured output signal is 
passed to the network input. Then, the series-parallel model is obtained (equivalent 
to the equation error model for parameter estimation of linear models). One advan¬ 
tage of the external dynamic approach is the possibility of using the same adaptation 
methods as in the case of static networks. However, the drawbacks are the increased 
dimensionality of the input space, possible stability problems and an iterative way of 
computing the static model behavior, namely through simulation of the model. Then, 
for example, a step function is used as the input signal and one has to wait until the 
steady state of the model is reached. 

ANNs with internal dynamics realize dynamic elements inside the model struc¬ 
ture. According to the kind of included dynamic elements, one can distinguish be¬ 
tween recurrent networks, partially recurrent networks and locally recurrent globally 
feedforward networks (LRGF), [9.44], The LRGF networks maintain the structure 
of static networks except that dynamic neurons are utilized, see Figure 9.21. The fol¬ 
lowing can be distinguished: local synapse feedback, local activation feedback and 
local output feedback. The simplest case is the local activation feedback, [9.2]. Here, 
each neuron is extended by a linear transfer function, most often of first or second 
order, see Figure 9.22. The dynamic parameters cij and bj are adapted. Static and 
dynamic behavior can be easily distinguished and stability can be guaranteed. 

Usually, MLP networks are used in LRGF structures. However, RBF networks 
with dynamic elements in the output layer can be applied as well, if a Hammerstein- 
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(a) parallel model (b) series-parallel model 



Fig. 9.20. Artificial neural network with external dynamics: (a) parallel model; (b) series-parallel 
model 


(a) local synapse feedback (b) local activation feedback 



(c) local output feedback 



Fig. 9.21. Dynamic neurons for neural networks with internal dynamics: (a) local synapse 
feedback; (b) local activation feedback; (c) local output feedback 
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Fig. 9.22. Dynamic perceptron, [9.2] 


structure of the process can be assumed, [9.2]. Usually, the adaptation of these dy¬ 
namic NNs is based on extended gradient methods, [9.44]. 

Based on the basic structure of ANNs, special structures with particular prop¬ 
erties can be built. If, for example, the local linear model network (LOLIMOT) is 
combined with the external dynamic approach, a model structure with locally valid 
linear input/output models result. 

c) Semi-physical models 

Frequently the static or dynamic behavior of processes depends on the operating 
point, described by the variables z. Then all the inputs have to be separated into 
manipulated variables u and operating point variables z. By this separation local 
linear models can be identified with varying parameters depending on the operating 
point, also called linear parameter variable models (LPVM), [9.5]. 

A nonlinear discrete-time dynamic model with p inputs «,■ and one output y can 
be described by 

y(k) = f(x(k)) (9.153) 

with 

x^A:) = [wi (k — 1), ••• ,u\(k - n u u p (k - 1), ••• ,u p (k - n um ) 

v(k — 1), • • •, y{k — n y )] (9.154) 

For many types of nonlinearities this nonlinear (global) overall model can be repre¬ 
sented as a combination of locally active submodels 
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M 

y = <M U ) £' ( u ) (9.155) 

1=1 

The validity of each submodel g; is given by its corresponding weighting function r/>,- 
(also called activation or membership function). These weighting functions describe 
the partitioning of the input space and determine the transition between neighboring 
submodel, [9.45], [9.3], [9.41], [9.43]. Different local models result from the way 
of partitioning of the input space u, e.g. grid structure, axis-orthogonal cuts, axis- 
oblique cuts, etc., the local model structure and the transition between submodels, 
[9.57], 

Due to their transparent structure, local models offer the possibility of adjusting 
the model structure to the process structure in terms of physical law based relation¬ 
ships. Such an incorporation of physical insight improves the training and the gener¬ 
alization behavior considerably and reduces the required model complexity in many 
cases. 

According to (9.155) identical input spaces for the local submodels g;(u) and 
the membership functions <f>(u) have been assumed. However, local models allow 
the realization of distinct input spaces. Figure 9.23 with 

M 

y = °< ( z ) £i( x ) (9.156) 

7 = 1 

The input vector z of the weighting functions comprises merely those inputs of the 
vector u having significant nonlinear effects which cannot be explained by the lo¬ 
cal submodels. Only those directions require a subdivision into different parts. The 
decisive advantage of this procedure is the considerable reduction of the number of 
inputs in z. Thus, the difficult task of structure identification can be simplified. 



Fig. 9.23. Structure of local model approaches with distinct inputs spaces for local submodels 
and membership functions 


The use of separate input spaces for the local models (vector x) and the member¬ 
ship functions (vector z) becomes more precise by considering another representa¬ 
tion of the structure in (9.156). As normally local model approaches are assumed to 
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being linear with reference to their parameters according to 

gi(x) = wi o + w n Xi+ -b w inx x nz (9.157) 

(9.156) can be arranged to 

}' = w 0 (z) + wi(z)xi + ••• + w p (x) x nz , ni ™ 

... i \ a / \ ( 9 . 106 ) 

with wj (z) = 2^i = i w ij ®i( z) 

Thus, the specified local model approaches can be interpreted as linear-in-the- 
parameter relationships with operating point dependent parameters Wj (z), where¬ 
upon these parameters depend on the input values in vector z. Consequently, the 
process coefficients Wj (z) still have a physical meaning. Therefore these models are 
called semi-physical models, [9.57]. 

The choice of approximate submodel structures always requires a compromise 
between submodel complexity and the number of submodels. The most often ap¬ 
plied linear submodels have the advantage of being a direct extension of the well 
known linear models. However, under certain conditions more complex submod¬ 
els may be reasonable. If the main nonlinear influence of input variables can be 
described qualitatively by a nonlinear transformation of the input variables (e.g. 
fi(x) = (.Tj, X] x 2 ,...)), then the incorporation of that knowledge into the sub¬ 
models leads to a considerable reduction of the required number of submodels. Gen¬ 
erally, this approach can be realized by a pre-processing of the input variables x to 
the nonlinearly transformed variables, Figure 9.24 

x* = F(x) = [f\ (x) f 2 (x) • • • f p (x)] T (9.159) 

Besides those heuristically determined model structures, local model approaches also 
enable the incorporation of fully physically determined models. Furthermore, local 
models allow the employment of inhomogeneous models. Consequently, different 
local submodel structures are valid within the different operating regimes. 



Fig. 9.24. Pre-processing of input variables x for incorporation of prior knowledge into the 
submodel structure 
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9.3.4 Identification with Grid-based Look-up Tables for Static Process 

In this section, a further nonlinear model architecture besides the polynomial-based 
models, neural networks and fuzzy systems is presented. These grid-based look-up 
tables are the most common type of nonlinear static models used in practice. Espe¬ 
cially in the field of nonlinear control, look-up tables are widely accepted as they 
provide a transparent and flexible representation of nonlinear relationships. Elec¬ 
tronic control units of modern automobiles, for example, contain about 100 such 
grid-based look-up tables, in particular for engine and emission control, [9.8]. 

In automotive applications, due to cost reasons computational power and stor¬ 
age capacity are strongly restricted. Furthermore, constraints of real-time operation 
have to be met. Under these conditions, grid-based look-up tables represent a suitable 
means of storage of nonlinear static mappings. The models consist of a set of data 
points or nodes positioned on a multidimensional grid. Each node comprises two 
components. The scalar data point heights are estimates of the approximated non¬ 
linear function at their corresponding data point position. All nodes located on grid 
lines, as shown in Figure 9.25, are stored, e.g. in the ROM of the control unit. For 
model generation, usually all data point positions are fixed a priori. The most widely 
applied method of obtaining the data point heights is to position measurement data 
points directly on the grid. Then, an optimization can be avoided. 
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Fig. 9.25. Grid-based look-up table of a six-cylinder SI engine 


In the following, the most common two-dimensional case will be considered. 
The calculation of the desired output Z for given input values X and Y consists of 
two steps. In the first step, the indices of the enclosing four data points have to be 
selected. Then, a bilinear area interpolation is performed, [9.54]. For this, four areas 
have to be calculated, as shown in Figure 9.26, [9.54], [9.56]. 
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Fig. 9.26. Areas for interpolation within a look-up table 


For the calculation of the desired output Z, the four selected data point heights 
are weighted with the opposite areas and added up. Finally, this result has to be 
divided by the total area, (9.143). 


Z(X , Y) = 


[Z(iJ)(X(i+j)-X)(Y(j + \)-Y)] 

' -v-" 


|_ area 1 

+[Z(i + 1, y) (X — X(i)) (Y(j + !)-}■) ] 


area 2 

+[Z(i, 7 + 1) (X(i + 1) - X) (Y - Y(j))] 

' -^- / 

area 3 

+[Z(i + 1,7 + 1) (x - X(i)) (Y - Y(j))] 

area 4 

+[ (xg + 1 ) - x(D)(Y(j + 1 ) - nm 

overall area 


(9.160) 


Because of the relatively simple computational algorithm, area interpolation rules 
are widely applied, especially in real-time applications. The accuracy of the method 
depends on the number of grid positions. For the approximation of “smooth 11 map¬ 
pings, a small number of data points is sufficient, while for stronger nonlinear be¬ 
havior a finer grid has to be chosen. 

The area interpolation is based on the assumption that all data point heights are 
available in the whole range covered by the grid. However, this condition is often not 
fulfilled. 
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Grid-based look-up tables belong to the class of nonparametric models. The de¬ 
scribed model structure has the advantage that a subsequent adaptation of single data 
point heights due to changing environmental conditions is easy to realize. However, 
the main disadvantage of this look-up table is the exponential growth of the number 
of data points with an increasing number of inputs. Therefore, grid-based look-up 
tables are restricted to one- and two-dimensional input spaces in practical applica¬ 
tions. Determination of the heights of the look-up table based on measurements at 
arbitrary coordinates with parameter estimation methods is treated by [9.40], 

Another alternative are parametric model representations, like polynomial mod¬ 
els, neural networks or fuzzy models, which clearly require less model parameters 
to approximate a given input output relationship. Therefore, the storage demand of 
these models is much lower. However, in contrast to area interpolation, the com¬ 
plexity of the computation of the output is much higher, since complex nonlinear 
functions for each neuron of a fuzzy rule has to be computed. On the other hand, 
grid-based look-up tables are not suitable for the identification and modelling of dy¬ 
namic process behavior. 

A detailed overview of model structures for nonlinear system identification is 
given in [9.43]. 

9.3.5 Parameter Estimation for Non-continuously Differentiable Nonlinear 
Processes (Friction and Backlash) 

Non-continuously differentiable nonlinear processes appear in mechanical systems, 
especially in the form of friction and backlash and in electrical systems with magne¬ 
tization hysteresis. 


a) Processes with friction 


In many mechanical processes dry and viscous friction appears. Here, dry friction can 
be modelled as a constant with a velocity-dependent sign. In steady state, a hysteresis 
curve arises from the dynamic relationship. 

For the identification of processes with friction, the hysteresis curve can directly 
be found pointwise by slow continuous or stepwise changes of the input signal 
u(t) = y\(t) and the measurement of y{t) = j 2 (?)• 

If the hysteresis curves are described by 


y+(u) = K 0 + + Ki+u 
y~(u ) = K 0 - + K\-u 


(9.161) 


then the parameters can be estimated from v = 1,2,... , N — 1 measured points 
with the least squares method 

~ = N E u(v) y± (v) - ^ u(v) J2 v±W 

!± N J2 « 2 (v) ~ E u(v) J2 u ( v ) 


(9.162) 
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Ko± 


^[E -> ;± (»o-*i±E u(v \ 


(9.163) 


As the differential equations are linear in the parameters, direct methods of parameter 
estimation can be applied for processes with dry and viscous friction in motion. 
For this, both differential and difference equations are well-suited process models. 
In some cases, it is expedient not only to use velocity-dependent dry friction but 
also velocity direction-dependent dynamic parameters, e.g. in the form of difference 
equations 


m m 

y(k) = - ai+ y(k - 1) + y^6 g -+ u(k - i ) + Kq+ (9.164) 

1=1 /=i 

m m 

y{k) = - at- y(k ~ 0 + E A i _ u{k — 1) + Kq— (9.165) 

1=1 i= 1 

^ 0 + and A'o_ can be understood as direction-dependent offsets or DC values. Then, 
the following methods can be applied for the estimation of these offsets: 

• implicit estimation of the offset parameters Kq+ and Kq-', 

• explicit estimation of the offset parameters Kq+ and Kq- with generation of 
differences A v(k) and Au(k) and parameter estimation for 


A y(k) = -E«' A y(k — i ) + A u(k — i ) (9.166) 

1=1 1=1 

with the assumption of velocity-independent dynamic parameters a, and bj. Then, 
for each direction, the parameters Kq + and A' 0 _ have to be estimated separately. 

For this parameter estimation method with a direction-dependent model, an addi¬ 
tional identification requirement has to be considered, which is that the motion takes 
place in only one direction without reversal. This means it has to satisfy 

y(0 > 0 or y(t) < 0 (9.167) 

which can be tested by 

A y(k) > e or A y(k) < s (9.168) 

for all k with 6 = 0. 

A test signal for proportional acting processes fulfilling this condition was pro¬ 
posed by [9.38], Figure 9.27. The motion in one direction with a certain velocity is 
generated by a linear ascent. Then, this is followed by a step for the excitation of 
higher frequencies and a transition to a steady state condition. In the case of a rever¬ 
sal of motion, the parameter estimation has to be stopped (in Figure 9.27 the points 
1, 2, 3, ...) and has either to be restarted or continued with values according to the 
same direction. 

The hysteresis curve can be computed from the static behavior of the model 
(9.164), (9.165) 
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Fig. 9.27. Test signal for parameter estimation of processes with dry friction 
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i + E Mi- 


it 


u 


(9.169) 

(9.170) 


For the verification of the parameter estimation based on the dynamic behavior, the 
computed characteristic curve can be compared with the measured curve resulting 
directly from the measured static behavior. 

For rotary drives, [9.23], [9.22] have developed a special parameter estimation 
method that correlates the measured torque with the rotational acceleration and es¬ 
timates the moment of inertia. Following from that, the characteristic curve of the 
friction torque can be estimated in a nonparametric form. 

The methods described above for the identification of processes with friction 
have been successfully tested in practical applications and applied to digital control 
with friction compensation, see [9.38], [9.52]. Further treatment is given by [9.1], 
[9.9], 


b) Processes with backlash (dead zone) 

As an example again, an oscillator with backlash or dead zone of width 2 y t is con¬ 
sidered, Figure 9.28. For the oscillator without backlash, it is 

m y 2 (t) + d y 2 (t ) + c y 2 (t) = c y 3 (t) (9.171) 

The backlash can be described as follows 

( Mr) - y, for >'i (0 > TV ) 

}’3 (t) = ( 0 for -y t < yi(f) < y, l (9.172) 

[ JT (0 + }’t for y i (0 < y t ) 

This equation leads to the nonlinear characteristic shown in Figure 9.28b. In the case 
where the backlash is at one restriction (y\(t) > yt), it is 


m y 2 (t) + d y 2 (t) + c y 2 {t ) + c y, = c y\(t) 


(9.173) 
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and for the other restriction with y\ ( t) < yt 

m j 2 (?) + cl y 2 (t) + c y 2 - c y t = c y i(?) (9.174) 

The backlash appears as a constant with a sign depending on the sign of Vi (?). For the 
range inside the backlash, it is y 3 (?) = 0 and it holds that the system eigen-behavior 

m y 2 (?) + d y 2 (?) + c y 2 (t ) = 0 (9.175) 

if point 3 (for instance, because of a friction not considered) is fixed. If point 3 is not 
fixed and can move arbitrarily inside the backlash, the spring forces do not apply. 
Then, one has to set y 2 = y 3 and in (9.161) c = 0. 




Fig. 9.28. Mechanical oscillator with backlash (dead zone): (a) schematic set-up; (b) block 
diagram for the cases vj (?) > y, and y\ (r) < y t 


One obtains a simplified block diagram for the regions outside the backlash 
shown in Figure 9.29. The effect of the backlash in these regions can be interpreted 
as an offset shift of the input signal with changing sign. 


9.4 Symptom generation with identification models 

a) Parameter estimation 

As discussed in Chapter 1, certain process parameters change if they are influenced 
by faults. This means that either one parameter or several parameters change com¬ 
pared to the normal parameter set 0 0 
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Fig. 9.29. Simplified block diagram for a linear system with backlash for \y\(t)\ > \yt\ 


A0 = 0 (k)- 0 o (9.176) 

As the parameters usually fluctuate because of signal disturbances and unmodelled 
effects, thresholds ±A 0 t ), have to be defined for normal low and high values, and 
limit checking has to take place, see Chapter 7. Parameter changes which exceed the 
thresholds are then called symptoms 

A e s = 0(k) - (00 + A0 lh ) 0(k) > 00 ) 

A0 s = 0(k)-(0o-A0, h )0(k)<0o\ 

In the case of parameter estimation with continuous-time models the parameter esti¬ 
mates can then be directly related to physical process coefficients by using the inverse 
algebraic relationship 

p = r 1 «>) 

see (5.24). The changes of the basic physical process coefficients then directly allow 
to determine the faults, if the identifiability condition for these process coefficients 
is satisfied, see Section 5.2.3 and Chapter 23.4. However, the determination of these 
physical process coefficients is not necessary because certain faults result in certain 
parameter changes A0 . Based on patterns of these parameter changes, then the di¬ 
agnosis of the faults is possible simply, as will be shown in several examples in Part 
V, also by using classification and other diagnosis methods. 

For discrete-time models the relation between the parameter estimates and the 
continuous-time model parameters is rather complicated, as can be seen from a z- 
transform table. Calculation of the continuous-time parameters is only recommended 
for first order systems. This means that for processes for orders m > 2 upwards only 
classification and other diagnosis methods are applicable. But also in this case simple 
patterns of parameter changes may be sufficient in practical applications. 

b) Neural networks 

Because neural networks usually contain many parameters as weights, which do not 
allow a physical interpretation, these parameters seem not to be suitable for fault de¬ 
tection. For nonlinear static models then especially look-up tables allow a direct com¬ 
parison of output signals, e.g. in form of look-up table differences Ay(u\, i< 2 , ■ . .). 
The identified neural network models are mostly used to generate output signal 
symptoms. 
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c) Input excitation 

Parameter estimation of dynamic processes requires an excitation through the input 
signal. For processes which change their inputs in normal operation anyhow, like 
for servo control systems, actuators, machine tools and robots, vehicles and other 
mobile systems, identification methods can be applied directly, eventually under the 
condition of enough rich excitation or persistent excitation, e.g. [9.30], [9.31]. Other¬ 
wise special exciting test signals, like step functions, harmonic oscillations or PRBS 
have to be applied. Depending on the operating conditions, this can in some cases be 
tolerated from time to time, like for pumps, or be applied without any problems for 
end-of-line quality control of assembled products. 


9.5 Problems 

1) The following linear first order difference equation is given y(k)+a\ y(k — 1) = 

b\u(k — 1). The measured input and output signals u(k) and y(k) respectively, 
with k = are given. What are the estimation equations to calculate the 

parameters a \ and b\ with the least squares method? What has to be changed in 
the estimation equation if a dead-time of d = 5 is added? 

2) Derive the estimation equation for the estimation of the weighting function 
g(k),k = 0 ...« (impulse response) of a linear process using the least squares 
method given the measured input and output signals u(k ) and y(k)(k = 1 ...N) 
respectively? 

3) The linear first order system 

y(k ) + ci\y{k — 1) = b\u{k — 1) 

is given with the following measurements: 


k 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

u(k ) 

0 

T 

- 1 

1 

T 

1 

-1 

T 

0 

0 

0 

y(k 

0 

0 

0 

0 

o' 

0 

T 

IT 

0 

0 

0 


Calculate the estimates of the parameters a i and b\. 

4) State the recursive estimation equations for the least squares method for the lin¬ 
ear first order process 

y(k) + a\ y(k — 1) = b\ u(k — 1) 

What has to be changed in the estimation equations if a dead-time of d = 2 is 
added? 

5) The first order differential equation 

v(t) + aiy(t) = b 0 u(t ) 

describes a linear system in continuous-time domain. The input and the output 
signal u(t) and y(t ) and its derivative y(t ) are measured at discrete time samples 
k = 1 ...N. Determine the equations to estimate the parameters a\ and bo of the 
differential equation with the least squares method. 
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6 ) What are the conditions for the utilization of direct parameter estimation meth¬ 
ods, (e.g. LS method) for the parameters of polynominal nonlinear models? 

7) Sketch a neuron of a MLP and a RBF network. 

8 ) The training of MLP networks and the training of the output layer weights of 
RBF networks should be compared. Which kind of optimization methods are 
applicable? Characterize the different optimization methods. 

9) Consider a two-layer perceptron network with P inputs, K hidden neurons and 
M output units. Write down an expression for the total number of network 
weights. 

10) Determine the equations for the interpolation of a grid-based look-up table 
with one input and one output. What kind of interpolation appears in this one¬ 
dimensional case? 

11) The Coulomb friction of a second order oscillating system increases because of 
missing lubrications. How can a symptom for fault detection be generated? Use 
equations. 

12) Determine the vectors of (9.120) for parameter estimation with continuous-time 
signals for the DC motor described in Example 10.3 and design the correspond¬ 
ing state-variable filter. 

13) Determine the recursive parameter estimation algorithm for a model of the DC 
motor in discrete time. Define the required vectors. 
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Fault detection with parity equations 


A straightforward way to detect process faults is to compare the process behavior 
with a process model describing the nominal, i.e. non-faulty behavior. The difference 
of signals between the process and the model are expressed by residuals. Therefore 
residuals describe discrepancies between the process and the model and check for 
consistency, [10.8]. The design of the residuals can be made with transfer functions 
or in state-space formulation. The method of parity equations goes probably back to 
[10.5] with a formulation for state-space models. Further publications have shown 
this method for different model structures, like for input-output models by [10.10], 
[10.11] and enhanced state-space models by [10.23] and others, see Chapter 11. 


10.1 Parity equations with transfer functions 


Figure 10.1 shows two arrangements for the case of linear processes. To explain 
the method, first a single-input single-output process is considered. The process is 
described by the transfer function 


G p (s) 


}’p C?) = B p {s) 
u(s) A p {s) 


and the process model by 


n t \ ymi s ) B m (,s) 
G m {s) = —— = —— 
u(s) A m {s) 


( 10 . 1 ) 


( 10 . 2 ) 


The model is assumed to be known and has known, fixed parameters, such that 


Gpis) = G m (s) + AGmis) (10.3) 

where A G m (s) describes model errors. 

The residuals can now be formulated by the output error or the polynomial error, 
similar to parameter estimation methods. 
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Fig. 10.1. Residual generation with parity equations for a MIMO process with transfer func- 
tions:(a) output errors; (b) polynomial errors or equation errors 


For the output error the residual becomes 

r'(s) = y P (s) - }’m(s) = y P (s) - G m {s) u(s) 

= Gp(s) [u(s) + f u (s)\ + 1 i{s) + fy(s ) - G m (s) u(s') 

= A G m (s) u(s) + G p (s) fu(s ) + n(s) + f y (s) (10.4) 

The residual is zero for ideal matching of process and model, no additive faults f u 
and f y and no noise. Usually, it shows deviations depending on the model error A G m 
and noise n and the exciting input signal u. In this case of additive faults the residual 
changes are identical with the output fault f y and filtered by the process G p for input 
faults /„. 

The polynomial error (or equation error ) leads to 
r(s) = A m {s) y p {s) - B m (s) u(s ) 

= A m (s) [G p (s) [u{s) + f u (s)\ + n{s) + f y (s)\ - B m (s ) u(s) (10.5) 

If the process and the model agree, ideally the residual becomes 

r(s) = A m (s) [f y {s) + n{s)] + B m {s) f u (s) (10.6) 

Additive input faults f u are filtered by the model polynomial B m (s ) and additive 
output faults f y by the polynomial A m (s). which both may obtain higher order 
derivatives. (10.4) and (10.5) are parity equations, and r' and r are called primary 
residuals, [10.8]. 

The residuals of the considered single-input single-output process are in both 
cases influenced by additive input and output faults, by the noise and by model errors, 
(e.g. by parameter changes) and a separation is usually not possible. However, the 
situation improves if more measurements are available, as, for example, for multi¬ 
input multi-output (MIMO) processes. 

The output residual of a MIMO process with transfer function matrix G p (s) is 
calculated by 


r V) = y p (s) - y m 00 = y^fa) - G m 00 u(j) 


(10.7) 
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which is called the computational form of the parity equation, [10.8]. By this way 
the residuals are calculated, using the measured process input and output signals. If 
the assumed faults are introduced, one obtains 

r'O'O = G p (s) [uO) + f„(,v)] + fy(s) + n(,y) - G m (.y) u(.v) 

= AG,„(j) u(5) + Gp(^) f„(.y) + f y (s) + n(.sj (10.8) 

which shows the influence of the faults on the residual vector. This is called the 
internal form or unknown-input-effect form ox evaluation form, [10.8], [10.4]. 

If process and model are identical, the internal form of the residual equation 
reduces to 

r'O) = Gp(.s) f„(.v) + f v (.y) + n(.v) (10.9) 

Deriving the equation error residuals, one may write the process in the form 

ApOOy^Cy) = B^G) u(y) (10.10) 

The computational form of the polynomial residual is then 

r(.s) = A m (s) y p (s) - B„,(.y)u(.y) (10.11) 

The internal form becomes 

r(.s) = A m (s) [Gp(^) u(.y) + G^-y) f„(.y) + f v (,y) + n(.y)] - B m (,y) u(y) (10.12) 

and if process and model are identical 

r(s) = A„,(5) [f v (5) + n(5)] + B„,(^) f„(.v) (10.13) 

The number of residuals equals the number of output signals. 

If only single input or single output faults appear, some elements of the residual 
vector deviate differently and some do not, which makes a separation or isolation of 
the additive faults possible as will be shown later. 

A comparison of (10.9) and(10.13) shows that the two types of primary residuals 
are related by 

r(.y) = A,„ (s) r'(.y) (10.14) 

Therefore, the polynomial residual includes higher order derivatives of the signals, 
which may lead to realizability problems and to an amplification of higher frequent 
noise. 


Example 10.1: Comparison of parity equations for a simulated process 

a) Simulated process 

To show the effects of different faults on the output and equation error, a linear first 
order process is considered, described by the transfer function 
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Gp(s) = 


u(s) 


ns) - yq 

U(s ) - Uo 


K 

Ts + 1 


where Uo, Yq is the operation point in steady-state. This relatively simple process 
was selected to allow as well simulations as analytical expressions for the different 
designs of parity relations, [10.19]. First, the system is subject to faults in the process 
parameters, which are either step- or rampwise. These are the two cases most likely 
encountered in real systems, mimicking both suddenly appearing faults and slowly 
drifting faults. Considering faults in both process parameters, the transfer function 
becomes 


G(s) = 


(K + A K) 

(r + Ar)i+i 


where A K and AT describe the individual process parameter faults. 

The process input is initially at Uo, the output at To. At the onset of the fault, 
two different cases will be examined: The system input either remains constant or 
jumps stepwise from Uo to Cj. One can derive two parity equations, one based on 
the output error , given as 


r'(s) = y(s) 


Km 


. Tm s + 1 


u(s) 


where the index M denotes the coefficients of the model. The other scheme for 
forming a parity equation is the equation error or polynomial error, yielding 


r(s) = A m (s) y(s) - B M (s) u(s) = (T M s + 1) y(s) - K M u(s) 


The corresponding schemes are shown in Figure 10.1. 

Besides process parameter faults, the system will also be subject to additive faults 
at the input and output. These faults are also modelled as both a step and a ramp 
respectively. Furthermore, the system is subject to noise acting at the input or the 
output of the system respectively. Considering this, the output becomes 

yG) = ( YV+i ) + + + + Uy ^ 

where f u (s) is the additive fault at the input and n u (s) the noise injected. f y (s) 
denotes an additive fault at the output. 


b) Process parameter faults 

If the system input remains unchanged and the gain change A K appears suddenly, 
stepwise, the response of the residuals will be 

r'(t) = Uo A K (l - ; r ( t ) = u 0 A K 

If a step from Uo to U\ is applied to the system input at the same time as the fault 
occurs, the residuals will react as 



10.1 Parity equations with transfer functions 201 


r'(t) = Ui AK^l-e-ry, r(f) = U\ AK 

If the time constant T changes by AT" then for a constant system input, the residuals 
are given by 

r\t) = 0 and r(t) = 0 

Thus, during steady-state operation, the parity equations are insensitive to faults in 
T. 

Detectability of faults in T is also difficult during dynamic operations. If a step- 
input is applied to the system, the residuals are only dynamically excited as 

r'(t) = -K(U!-U 0 ) (e-rTAr-rr) and r(t) = _ Ar ^ U °^ e~ Wat 

Figure 10.2 shows the time histories for the residuals for stepwise and also drift- 
wise changes of the parametric faults. 



Fig. 10.2. Influence of parameter faults AK, AT on the output residual r'(t) and the equation 
error residual r(t) for different time history of the faults: step or drift and no input or stepwise 
input excitation 
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c) Additive faults at the input/output 

If an additive stepwise fault is applied at the process input f u (t) = A f u <j(t) with 
o(t) being the step function, the residuals react as 

r'(t) = K A f u (l - and r(t ) = K A f u 

If a stepwise additive fault f y (t) at the output appears independent of the signal that 
is applied at the process input, the residual react as 

r'(t) = Af y and r(t) = Af y (l + T <$(?)) 

Figure 10.3 shows the time histories of the residuals for stepwise and driftwise addi¬ 
tive faults. 



residual 


Fig. 10.3. Influence of additive faults at the input and output on the residuals for step- and 
driftwise fault development over time. Results are independent of input excitation u(t ) 


d) Influence of noise at the input and output 

The points of attack of the noise are identical with the points of attack for additive 
faults. The power density spectra are denoted as S nu „ u (of) and Syr (co) respectively. 
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Noise n u {t) in the parity equation caused at the input can be calculated as 

K 2 

Sn r ,n r ,(co) = 1 _|_ w 2j2 ^ n u n u ( 0J ) anc * $n r n r (f -») = K 2 S nu „ u (a>) 

For the output error scheme, input noise appears as a low-pass filtered noise, the 
constant component (w = 0) is attenuated by a gain K. For the equation error, all 
spectral components are attenuated by the same, frequency independent, gain K. 

If a noise source n y (t) at the output is present, the noise n y (t) is simply added 
to the output error residual r'{t). For the equation error residual r(t), the noise is 
high-pass filtered with gains greater one for all frequencies. Thus, high frequencies 
are attenuated with a large gain, which is undesirable in most applications. 

e) Summary 

Regarding threshold-design, there is no difference between output and equation error 
for faults in K or simultaneous faults in K and T. The thresholds for both residuals 
will be equal. Thus, the selection of the residual is dependent on other criteria like 
noise for instance. With faults in T, the reaction of r(t) for a stepwise fault in the 
presence of a stepwise excitation is stronger than that of residual r'(t). Thus, one 
should opt for the equation error, provided there is no output noise present. 

Furthermore, the output error r'(t ) and the equation error r(t) react similar for 
additive faults at the process input and output. The values for t —» oo are the same, 
it only takes more time for r'(t) to reach this value. Thus, it depends on the time 
constant T, if r(t) should be preferred. r'(t ) should be preferred if the process has 
much noise n u (t) with high frequencies at the input. Then, the low-pass behavior 
of the output error filters this noise out, while r(t) would amplify this noise with 
gains of K. Small frequencies are amplified with the same gain for r'(t) and r(t). If 
high frequent noise occurs at the system output, one should prefer the output residual 
r'(t), because in this case all frequency components remain unchanged. For high fre¬ 
quencies with high amplitudes at the system output, r(t) should not be used either, 
because this noise is being amplified with >/l + co 2 T 2 . These results are summa¬ 
rized in Table 10.1. 

The results given here are an extract of [10.19]. There also the time histories for 
driftwise faults are calculated, also for simultaneous changes of the gain and the time 
constant. Furthermore, the sensitivities of the residuals with regard to the faults are 
given and tabulated. 

□ 


10.2 Parity equations with state-space models 

10.2.1 Continuous-time parity approach 

The state-space model of a linear multi-input multi-output process according to Fig¬ 
ure 10.4 is considered, which is described by 
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Table 10.1. Properties of output residuals r'(t) and equation residuals r(t) 
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x(t) = A x(t) + B u(?) + V \(t) + L f(?) (10.15) 


y(0 = € x(?) + N n(f) + M f(0 (10.16) 


where n(/) are noise disturbances and \(t) unmeasurable inputs or disturbances, i'(t) 
are additive faults which may be composed of additive faults f'/(f) on the input and 
f m (t) on the output 


f r (0 = [ f m(0 f f (0 


(10.17) 


The design of the residuals follows the original approach from [ 10.5], see also [ 10.7] 
and [10.12], 
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Fig. 10.4. Parity equations based on state-space model for continuous time 


Introducing (10.15) in (10.16) yields 
y (?) = C x(t ) + N n(f) + M f(f ) 

= CAx(f) + CBu(t) + CVv(f) + CLf(i) + Nn (t) + Mf(f)(10.18) 
The second derivative gives 

y(?) = C x(?) + N ri(?) + M f(?) 

= C A 2 x(?) + C A B u(?) + € B u (t) + C A V \(t) 

+C V v(0 + N h(t) + C A L f(?) + C L f(/) + Mf (t) (10.19) 

By this way a redundancy in the equations for the same time instant t is generated. 
Increasing the number of q < n derivatives of y(t) leads to the equation system 

Y(0 = T x(f) + Q„U(0 + Q v Wit) + Q„ N(0 + Q f F(f) (10.20) 

with 

y(t) 1 [ u(r) 1 r v(0 

y(0 u (?) v(t) 

Y(0 = . U(0 = . V (t) = . 

_y (?) (oJ L u(9) (oJ L v(9) (0 

and 

" C " 

CA 

T = CA2 Q v = 

CA 9 
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N 0 0 ••• 0 

CY N 0 ••• 0 

CAY CV N • • • 0 


CA 9_1 V CA ?-2 V ••• CV N 

M 0 0 ••• 0 

CL M 0 ••• 0 

CAL CL M O 

CA 9_1 L CA 9_2 L CL M 

For a system of order n with p inputs, p v disturbances and r outputs these matrices 
have following orders: 

• Y(?) is a (q + 1) r x 1 vector; 

• U(/) is a (q + 1) p x 1 vector; 

• T is a (q + 1) r x n matrix; 

• Q u is a (q + 1) r x (q + 1 )p matrix; 

• Qj/ is a (q + 1) r x (q + 1 )p v matrix; 

As the state vector x(t) and the disturbance \(t) are unknown (10.20) is multiplied 
by a vector w T 

w r Y t = w r T x(t) + w r Q„ U(0 + w T Q v V(/) + w r Q„N(f) + xv T Q f F(?) 

(10.24) 

By selecting w T (dimension 1 x (q + 1) r) such that 

w r T = 0 and xv T Q v = 0 (10.25) 

a residual vector in the computational form is obtained 

r(i) = w r Y(0 - w r Q„ U(0 (10.26) 

Through (10.25) a part of the elements of w T is determined according to order of 
T and Q,,. The remaining elements can be used to design different parity equations. 
Inserting (10.24) and (10.26) yields the internal form of the parity equation 

r(t) = w T Q f F(t) + w r Q„ N(i) (10.27) 

which shows how the residual is affected by the faults F(i) and noise N(t). If (10.25) 
is satisfied, the residual is independent of the unknown input v(?) and the states x(t). 
More residuals are obtained by selecting several different vectors xv T , thus forming 
a matrix W and the residual vector becomes 

r(0 = W Y(0 — W Q„ U(0 (10.28) 




compare Figure 10.4. 
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The order of W determines the number of parity equations. For Laplace transfor¬ 
mation of (10.28) one needs 

' y(s) 1 r I " 

s y(.s) s I 

Y(.v) . = . y(s) = ^y(s) v(.v) 

_s«y(j)J L‘ V?I _ 

■ u(.y) i rr 

s u(.v) s I 

U(j) = . = u(.y) = L u (s) u(.y) 

_ s q u(.y) J L^ 1 . 

and obtains 

r(.y) = WL,(i) y(.y) - W Q„ L„(y) u(.y) (10.29) 

which shows similarities to the equation error residual for parity equations with trans¬ 
fer functions, (10.5). 

The discussed approach of parity equations with state-space models needs higher 
order derivatives of order q for the measured inputs and outputs. They can be ob¬ 
tained by state-variable filters, e.g. [10.17] and Chapter 23.2. As they use the infor¬ 
mation of the signal in one time instant relatively noisy results can be expected in 
practice. Therefore discrete-time versions should be preferred as treated in the next 
section. 

10.2.2 Discrete-time parity approach 

The parity equations with state-space models are now considered for discrete-time 
models. They will be simpler to derive and to implement in this form than for con¬ 
tinuous time. According to Figure 10.5 the basic process equations are 

x(k + 1) = A x(k) + B u(k ) + V y(k) + L f(/c) (10.30) 

y(k) = C x(k) + N n(/r) + M f (k) (10.31) 

where \(k) and n (k) are unmeasurable disturbance signals, f( k ) are additive faults 
which may be composed of additive faults f'/(?) on the input and f m (t) on the output. 

The design of the residuals follows the original approach from [10.5], see also 
[10.7] and [10.12]. To simplify the notations, the state-space model without faults 
and disturbances is used 


x(k + 1) = A x(k) + B u(Ar) 

(10.32) 

y (k) = C x(k) 

(10.33) 

Introducing (10.32) in (10.33) yields 


y (k + 1) = C A x(k ) + C B u(k) 

(10.34) 
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Fig. 10.5. Residual generation with parity equations in discrete time for a MIMO process with 
a state-space model 

u process input vector (1 x p) n noise vector (1 x r) 

x process state vector (m x m) y input noise vector (1 x m) 

y process output vector (1 x r) f fault vector 

For the next sampled output it holds 

y (k + 2) = C x(k + 2) 

= C A x(k + 1) + C B u(A: + 1) 

= C A 2 x(k) + € A B u (k) + € B u(k + 1) (10.35) 

and for the sample (q < m) 

y (k + q) = C A 9 x(k) + C A 9-1 B u(A-) + ... + CBu(H?-l) (10.36) 

Here redundant equations are generated for different time instants. Now, the equa¬ 
tions for a time window of length q + 1 are summarized and lead to 

Y(k+q) = Tx(k)+ QU{k + q) (10.37) 

or time shifted by q backwards 

Y(/e) = T x{k - q) + Q V(k) (10.38) 


with the vectors 


y (k - q) 


u(k — q) 

y(k ~q+ 1) 

u (k) = 

u (k -q + 1) 

y (k) 


ii(A-) 


Y (k) = 


(10.39) 
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and the matrices 

" C " 

C A 

T = CA 2 Q = 

_C A q _ 

Hence (10.38) describes the (q + 1) input and output signals and the initial state 
vector x(k — q) over a time interval of length (q + 1), thus forming a temporal 
redundancy, [10.23]. 

As the state vector x(k — q) is unknown, (10.38) is multiplied by a vector vv 7 

w r Y(k) = w T T x(k - q) + w T Q U(k) (10.41) 

By selecting w T such that 

w r T = 0 (10.42) 

an input-output relation results and residual can be defined (computational form) 

r (k) = w r Y(k) - w T Q U(k) (10.43) 

The dimension of w T is 1 x (q + l)r where r is the number of outputs. If the order 
of A is m, the matrix T has order m x (q + 1 )r. Through (10.42) m elements of w T 
are determined. However, the remaining (q + l)r —m elements of w 7 can be chosen 
freely. 

More residuals can be determined by multiplying (10.38) with a matrix W and 
the condition 

WT = 0 (10.44) 

The residual vector then results as, see also Figure 10.5, 

r (k) = W [Y (k) - Q U(/c)] (10.45) 

W contains the vectors w r , where some elements are determined by (10.42). The 
remaining elements can be chosen such that special properties of the residual, e.g. 
structured residuals are obtained as will be shown in Section 10.3. An example is 
given in [10.16], see also [10.14]. The length q + 1 of the time window is free. 
Usually q = n is a proper choice, see [10.5]. 

If unknown inputs v(/c) act on the process, the residuals can be made independent 
of these inputs if in addition to (10.44) 

WQ„ = 0 (10.46) 

is satisfied, where Q„ corresponds to Q but with Y instead of B, see [10.14]. 

After introducing the noise and fault terms from (10.30) and (10.31), the internal 
form of the parity equation is obtained from (10.45) 
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r(f) = w r Q f f(0 + w r Q„ n(?) (10.47) 

where Qp and Q„ are matrices like Q, see [10.14]. The residuals then only depend 
on the additive faults and the noise. 

Practical experience shows that the residuals result with relative large variance 
due to noise and deviations between the process and its model. Therefore the resid¬ 
uals must be low-pass filtered. In comparison to the transfer function approach the 
state-space approach gives more freedom in selecting the residual generating vector 
W. However it uses less time history of the signals and is therefore more sensitive 
to noise, is more computationally involved and not so straightforward as the transfer 
function version. 


10.3 Properties of residuals 

In the ideal case the residuals should only be influenced by the faults to be detected. 
However, because of the existence of modelling errors, unknown input signals, sta¬ 
tionary and instationary disturbances at the outputs, the residuals will vary contin¬ 
uously. This means that wider thresholds for the residuals have to be used. But too 
large thresholds do not allow to detect small faults. There exist following ways to 
improve this conflict situation: 

• enhanced residuals for particular faults; 

• filtering of high frequency effects like noise from low frequent changing faults 
by low-pass filters; 

• maximizing of the fault sensitivity of residuals; 

• robustness against modelling errors; 

• adaptive thresholds, e.g. depending on the input excitation. 

All the measures have the goal to make the residuals sensitive to faults and robust 
against disturbing effects. 

10.3.1 Generation of enhanced residuals 

The primary residuals r arising from the output model or the polynomial model do 
not necessarily allow to separate the faults by observing their deviations. Furtheron 
the state-space approach leaves some freedom in the design of the weighting matrix 
W. The idea of enhanced residuals is to give the residual vector special properties 
in order to diagnose or at least to isolate the faults from each other. Two known 
approaches are the generation of structured and directional residuals, [10.8]. 

a) Structured residuals 

The goal of the design of structured residuals is that the faults influence some resid¬ 
uals and some not. Then, in dependence on the faults, certain patterns appear in the 
vector (or table). Generally speaking, this means that the residuals only influence 
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certain subspaces in the vector-space, compare Figure 10.6a. Therefore structured 
residual vectors are at least independent (or decoupled) from one of the faults. The 
resulting residual patterns are also called fault signatures. 

b) Directional residuals 

The design of directional residuals intends to reach a certain vector in the residual 
space for each fault, such that the direction is fixed, but the length of the vector 
depends on the fault size, see Figure 10.6b. Also in dynamic transients the direction 
of the residual vector should be maintained. 




Fig. 10.6. Enhanced residuals: (a) structured residuals; (b) directional residuals 


The residuals are usually checked against a threshold It is now assumed that 
the threshold is the same for positive and negative deflections and that the exceeding 
of the maximal or minimal threshold is triggered. This limit checking yields binary 
outputs 


f 0 if |r,-(0| < r thi 
\ 1 if \n(t)\ > r thi 


(10.48) 


r* = 1 means that one of the thresholds was exceeded. Then different isolating 
properties according to Table 10.2 can be distinguished. These structural matrices are 
called strongly isolating if by an error of one residual no other fault is isolated. Weak 
isolation means that by one error another (wrong) fault is isolated. If the patterns 
are indistinguishable, the residuals are not isolable. Some more, special cases are 
discussed in [10.8]. 

Table 10.3 shows the possible patterns for two residuals. Hence, 3 faults can be 
(weakly) isolated or generally 

k = 2 " - 1 


faults, if n is the number of residuals. 

More possibilities for the isolation and diagnosis of faults results if the signs and 
also the size of the residuals are taken into account. Including the sign leads to 
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Table 10.2. Structured residuals; 1= residual is sensitive; 0 = residual is insensitive against 
the respective fault 
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Table 10.3. Structured residuals without considering the sign of r; 


residuals 

no fault 

/l 

fl 

h 

r * 
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1 

r* 

r 2 

0 

0 
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f 0 if r thi < r t (t) < r+, 

r* = \ I if r t (t) > r+. (10.49) 

l -1 if r i(f) < r thi 

Here the positive and negative thresholds rf i and r7. can be different. A maximal 
set of patterns for two residuals in Table 10.4 shows that now 7 or generally 

k = 3" - 1 

faults are isolable. For n = 3 residuals the number of (weakly) distinguishable faults 
is 7 without sign and 26 with sign. Therefore the consideration of the sign of the 
residuals increases the number of detectable faults considerably. 


Table 10.4. Structured residuals including the sign of r,- 


residuals 

no fault 

fl 

h 
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fl 

r * 
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A further improvement for diagnosis is obtained by using the size of the residual 
deflections. If the residuals are marked by 


' 0 if r 7 h i<r t {t)<r+ hi 
+ if i'i(t) is positive small 
• ++ if fjit) is positive large 
— if r, (t) is negative small 
— if i'j (t) is negative large 


(10.50) 


then theoretically 24 combinations are possible for 2 residuals, see Table 10.5 or 
generally 
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k = 5" - 1 

In practical cases, of course, not all combinations will appear. But this consideration 
shows that the residuals become better isolable and allow a more detailed fault di¬ 
agnosis if the sign and the size of the residual deflection is evaluated. This improves 
also the detection of multiple faults. 


Table 10.5. Structured residuals including the sign and size of r/ 
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10.3.2 Generation of structured residuals 


The goal in designing structured residuals is to generate good isolating patterns of the 
residual vector. This means that the residual should be at least independent (decou¬ 
pled) from the faults to be detected. As this is directly possible with additive faults 
f„ on the inputs and f v on the outputs mainly this case will be considered. 

The polynomial error (or equation error) of a MIMO process with p inputs and 
r outputs is according to (10.4) 


r(.v) = A,„(.v) y p (,s) - B,,,(.y) u p (s) 


(10.51) 


To generate structured residuals this equation is multiplied by a residual generating 
matrix W 

(10.52) 


or 


r *00 = W(.y) [A,„ (s) y p (s) - B,„ u p (.y)] 
’ A | (,y) 0 ••• 0 


r*(,y) = W(.y) \ 


0 A 2 (s)'-- 

0 0 •••A,.(.y) 


-| 

TiO) 


yih) 

- 

_y r (s)_ 


" Bfas) 0 ••• 0 ■ 


ufas) 

' 

0 B 2 (s )••• 0 


U2(S ) 


0 0 ■■■B p (s)_ 


_Up(s) _ 



(10.53) 
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To obtain zeros in r* the matrix W has to be selected such that the elements of r* 
become independent on one measurement each (elements of y p and u p ). 

Therefore 

W,(j) A m (s) = 0 and W„(j) B m (.v) = 0 (10.54) 


with W r = [W y W„] 

Changing the notation of (10.52) leads to 




(10.55) 


[Ai(.v)vi(i) + A 2 (j)j 2 (^) + ■ ■ ■ + Bi (s)u\(s) + B 2 (s)u 2 (s) + .. .] 


and with 

w^j(^) Ai(^) = 0 independence on vq (s) 
w^ 2 (,s) A. 2 (s) = 0 independence on y 2 (s) 


wj! (s) Bi(j) = 0 independence on u\(s) 
wj 2 (s) B 2 (,v) = 0 independence on u 2 (s) 


is reached. By this way the (dynamic) polynomials \\j (5) are designed which give 
the residuals r*(.v) according to (10.52) the required independence on specific input 
and output signals. 

This procedure will be demonstrated by two examples. 


Example 10.2: Structured residuals for a single-input two-output process 


Two parallel first order processes with one input u and two outputs y\ and y 2 are 
considered. Figure 10.7, 


Gi(s) 


ji C?) 

u(s ) 


K 1 


1 + T\s 


G 2 (s) 


k 2 


}’2(S) _ 

u(s) 1 + T 2 s 


(The input could be the steering angle 8 of a vehicle and the outputs the yaw rate i/r 
and the lateral acceleration y and the goal is to detect offset faults in all 3 sensors). 
The rearranged process equations are 


0 = Ti 0) + T\s >’1 (s) - K\u{s) 
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0 = J2OO + T 2 s y 2 (s ) - K 2 u(s) 

or in vector notation 


'O' 

0 

= 

'1 + T lS ' 
0 

TtO) + 

0 

1 + T 2 s_ 

}’ 2 (s) + 

1 1 

1 1 

K> h- 

1 _1 


To obtain an independence of the residuals from 3 signals, this equation is multiplied 
with the residual generating matrix 


W(.y) = 




XlCO. 

Independence on v 1, y 2 and a is reached by 








1 + Tis 
0 

0 

1 + T 2 s 

-Ki 
-K 2 


0 -> (.S’) = [0 1] 


= 0 -* w T y2 (s) = [1 0] 


= 0 -* wjj 0) = [K 2 -K 1 


After multiplying the vector notation equation by W(,y) three residuals result 


r*(y) = (1 + T 2 s) y 2 (s) - K 2 u(s) 
r*(s) = (1 + 7V) y, (.y) - K\u(s) 
r*(s) = K 2 (1 + T\s) y,(.y) - K,(\ + T 2 s) y 2 {s) 


It is interesting to see that these residuals also follow directly from the transfer func¬ 
tions 


ji(y) _ r ,, yi(s) 

u(s) 1 u(s) 


G 2 (s) ■ 


Til?) = Gi(s) 
yiis) g 2 (s ) 


if after cross-multiplication all terms brought on one side and 0 is replaced by r*. 
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Table 10.6. Fault signatures of different faults 
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These residuals give the signatures in Table 10.6. 

Hence, the residuals are strongly isolating for offset faults f u , f y \ and f y2 , also if 
their signs are different. Gain changes A K\ and A AT are only distinguishable from 
f y j and f y 2 if their deviations have the same sign. As this is usually not known, 
they are practically undistinguishable. These results hold as well for changing as for 
constant inputs. Changes of the time constants lead to residual deviations if the input 
u(t) changes. But they result only in passing small residuals and are practically not 
detectable. 

The residuals contain derivatives of the output signals which may be a problem 
if higher frequent noise is present. Then low-pass filters have to be applied, also 
damping the residual responses. 

If the same procedure is applied for the output error (10.6), following residuals 
result: 

r[*(s) = y 2 (s) - 

r' 2 *(s) = TiO) - j^f^u(s) 

r 3*( S ) = 1 + T 2 s Tl (■'’) — 1+ris T2(‘ y ) 

This set of equations does not need extra low-pass filters because no realizability 
problems exist. 

Summarizing, additive faults of all 3 signal measurements can be isolated by 
structured parity equations. 

□ 


Example 10.3: Structured residuals for a DC motor 

As a further example a DC motor with brush commutation is considered. Modelling, 
resulting equations and a signal-flow diagram are shown in Chapter 20. By neglect¬ 
ing minor effects, a classical DC motor can be described by linear differential equa¬ 
tions and the signal flow diagram in Figure 10.8. For the electrical part, the armature 
circuit, it holds 

U A (t) = Ra Ia(1) + L a i A (t) + 'P co{t) 
and for the mechanical part 

'P l A (t) = J 00 (f) + M F ftj(r) + M L (t ) 

(The symbols are explained in Chapter 20). 
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Fig. 10.8. Signal flow diagram of a DC motor. Parameters Ra = 1.53S2; La = 6.8 10 3 f2.s 
; = 0.34Vs ; Mp = 0.37 10 _3 7Vws ; J = 1.9 10~ 3 A-g»? 2 

Laplace transformation leads to 

0 = U A (s) - R a I a (s) - L a s I a (s) - 'I' m(s) 

0 = 'L I A (s) — J s cu(s) — Mp ft>(s) — Mp(s) 

or in vector notation 

°=[i] U A (s)~ [* a X a ‘] I a (s) — a ^ + [J] Ml(s) 

To obtain structured residuals, multiplication with the residual generation matrix 
yields 

r(-v) = W(.v) J U A (s) — Ra ^ AS Ia(s)~ Mp \ Js <»(s) 

+ J m l(s) 

Independence of the residuals on the measured signals and the unknown load torque 
Ml is obtained by 

wf (,y) ^ j = 0 —> wf (,y) = [01] (independent on U A ) 

w 2 ( J ) |" Jr \y A S = 0 —► "T (,y) = [T R a + L A s] (independent on I a) 

= 0 —> (,y ) = [Mp + J s — T] (independent on (») 

= 0 —> wj (,y) = [10] (independent on Ml) 

Hence, the residual generation matrix becomes 

0 1 

T' Ra + La s 

M f + J s -V 

1 0 
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and the resulting residuals are in the time domain 

ri (/) = IaU) — J 6)(t) — Mp a>(t) — Mi(?) 

r 2 (t) = V U A (t ) - ('I' 2 + Ra M f ) w(t) ~(R A J + L a Mp)a>(t) 

—L a J cb(t) - R a - L a M L (t) 

1-3(0 = M f U A (t) - (R a M f + ^ 2 ) I A (t) - (Mp L A + J R A )i A (t) 

~L a J I A (t) + * M L (t) + J U A (t) 
r 4 (t) = U A (t) - L a i A (t) - R a I A (t) - V (o(t) 

These residuals require the 1st order derivatives of U A (t), I A (t) and <o(t) and the 
2nd order derivatives of I A (t) and o>(t). This is a practical problem and can only 
be solved by low-pass filtering of the measured signals and especially by using state 
variable filters, see [10.14] and Chapter 23.2. A drawback is the required knowledge 
of the load torque Mp for all residuals except r 4 . 

As already mentioned, the residuals can also be obtained if the Laplace trans¬ 
formed state variables a>(s) and I A (s) are calculated based on the signal flow dia¬ 
gram of Figure 10.8 or the basic two equations as functions on the following mea¬ 
surable variables and input signals 

a>(s) = f[I A (s), Mp(s)\ independent on U A (s) 
w(s) = f[U A (s),Mp(s)\ independent on I A ( 5 ) 

I A (s) = f[U A (s ), Mp(s)] independent on w(s) 

I A (s) = f[U A (s),co(s)] independent on Mp (s) 

For example, the first equation yields 

0 = 'T I A (s) — J s co(s) — Mp w(s) — Mp(s) 

If then the zero is replaced by r\ (,s), the corresponding equation results. Hence, the 
residuals form special transfer functions of the process, where each one measurement 
is eliminated. 

The resulting fault signatures of the residuals for additive and parametric faults 
are given in the Table 10.7. Hence, the residuals are strongly isolating for the additive 
offset faults of the sensors of U A , I A and <», and for load changes A Mp. Parameter 
changes are practically not distinguishable for R A and L A and for J and Mp and 
are only weakly isolated. Change of T is also only weakly isolated from the others. 
Under steady state conditions, changes A R A are not distinguishable from AU a . and 
changes A Mp not from A Mp. Summarizing, additive sensor faults can be detected 
and isolated under the condition that the process parameters remain constant. The 
considered parity equations are not suitable to detect and isolate parameter faults. 
An application of parity equations to real DC motors is shown in Chapter 20. 

This example of the DC motor can also be used to study the behavior of the resid¬ 
uals in the case of multiple faults. If two additive sensor faults arise simultaneously 
and only the sign of the residuals is considered and not their size. Table 10.8 shows 
the result. The effect of the single faults on the residuals partially compensate and 
partially add to each other. However, in this case all fault signatures are different and 
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Table 10.7. Fault signatures of a DC motor for different single faults; * means, that deviations 
are only obtained by dynamic excitation of the process 



| Additive sensor faults 

Parameter faults 


+ fuA 

+ flA 

+ f(0 

+/ml 

+ Ira 

+Ila 

+ />!' 

+fj 

+/mf 

D 

0 

+ 1 

-1 

-1 

0 

0 

+1 

-1* 

-1 

r2 

+ 1 

0 

-1 

-1 

-1 

- 1 * 

±1 

-1* 

-1 

>'3 

+ 1 

-1 

0 

+ 1 

-1 

- 1 * 

±1 

±1* 

±1 

'4 

+ 1 

-1 

-1 

0 

-1 

-l* 

-1 

0 

0 


Table 10.8. Fault signatures of a DC motor for double additive sensor faults 



Additive sensor faults 


fuA AND flA 

fuA AND A/* 

fuA AND f ML 

flA and u 

flA AND f ML 

D 

+ 1 

-1 

-1 

0 

0 

>'2 

+ 1 

0 

0 

-1 

-1 

r 3 

0 

+ 1 

+ 1 

-1 

0 

>'4 

0 

0 

+ 1 

-1 

-1 


mostly even strongly isolating. Hence, by taking the signatures of Table 10.8 into 
account, it is possible to detect and isolate also double sensor faults. 

□ 


10.3.3 Sensitivity of parity equations 


The sensitivity of structured residuals with regard to additive and parametric faults 
is considered in the sequel, following [10.15], see also [10.9]. 

Writing (10.26) or (10.51) for one residual leads to 


r(t) = [w r 


w r Q] 


Y(0 

u(0 


= 0 T (p) fit) 


(10.56) 


where 6 1 (p) contains the process parameters in dependence on their physical para¬ 
meters p and f(t) the measured input and output signals and their derivatives. 

Small additive faults A fit) then yield, with 9 J (p) and foit) for the fault free 
case, 

r(t) = el (p) [f o(0 + A f{t)] = el (p) A fit) (10.57) 


as 

K (p)fo(0 = 0 (10.58) 

if there is no fault. If the process parameters remain constant the sensitivity becomes 


9 r(t) 
9 fit) 


*o(P) 


(10.59) 


The residual is therefore proportional to the additive faults A fit) with the sensitivity 
coefficient (gain) el (p) and does not depend on the input excitation U(f). 
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In the case of parametric faults Ap(t) the residuals are calculated according to 
(10.56) 

r(t) = 0o(p)f(t) (10.60) 

where 0o(p) are the fixed parameters in the model. In the fault free case, it is r(t) = 
0. After a small fault in the parameters the process follows a new differential equation 


0 = (p <)f 0 (t) + a^[0 r (p) f(t)] Ap 

= 0o (P)fo(0 + Ap+ 0 o r (p) ^Ap (10.61) 

= 00 (P)hM0 + AiKO] + ^ Ap 

where 

9 i lr(t) 

Af(t) = —tr— A(p) (10.62) 

9 p 

describes the changes of the signals due to the parameter changes Ap. 

The calculated residual with the fixed parameters according to (10.66) but the 
changed signals then becomes 

r(t) = 0o(p)[f o (t) + A f(t)] (10.63) 


and from (10.61) is follows 


r(t) = 


90 r ( P) 

9p 


fit) Ap 


The residual sensitivity to parametric faults is therefore 


dr(t ) 

9p 


90 r (P) 

9p 


fit) 


(10.64) 


(10.65) 


and depends on the parameter sensitivity of the process model with regard to physical 
parameter changes Ap and the measured signals f(t). Therefore the sensitivity of 
parity equations to parametric faults depends on the input excitation U(t) of the 
process. For U(t) = 0, no parametric faults can be detected. 


10.4 Parity equations for nonlinear processes 

10.4.1 Parity equations for special nonlinear processes 

Because of the large variety of nonlinear process models it is not possible to de¬ 
scribe general applicable ways to generate parity equations. If, however, the nonlin¬ 
ear model can be expressed by 

y m {t ) = fNL[y{t), y(t)< ■ ■ ■; w( 0 , u(t), U(t )...] (10.66) 


output residuals 
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r(t) = y p (t ) - (10.67) 

can be directly generated. This holds also for discrete time polynomial models like 
the Hammerstein model or the parametric Volterra model, described in Section 9.3.2. 
The output residual for the Hammerstein model is, for example, 

r'(k) = y p (k) - A m (q~ l ) y m (k ) + B im (q~ x ) u{k ) + B 2m (q~ 1 ) u 2 (k) + ... 

(10.68) 

For these nonlinear polynomial models even equation error residuals are possible, 
like 

r(k) = (l - A m (q~ 1 )) y P (k)-By m (q~ l ) u(k)-B lm (q~') u 2 (k)~ ... (10.69) 

In Section 5.2.4 a bilinear state variable model is considered, where the input signal 
U(k) is multiplied with the state variables x(Ar). [10.14] has shown for the case of 
continuous time how parity equations can be generated by decoupling them from 
the bilinear term U; X similar as in Section 10.2.1. Artificial neural networks, as 
described in Section 9.3.3, can directly be applied to generate output residuals, see, 
e.g. some case studies in [10.18]. 

The structure of the nonlinear process models depends strongly on the considered 
process. 

10.4.2 Parity equation for nonlinear, local linear models 1 

The considered approach is based on a Takagi-Sugeno (TS) fuzzy model of the nomi¬ 
nal process, [10.25] and follows [ 10.2]. The model can be built from heuristic knowl¬ 
edge and/or by means of identification algorithms from measurement data. The trans¬ 
parent inner structure of the model is used for the generation of symptoms that in¬ 
dicate the occurrence and the locations of sensor faults. This enables a continuous 
fault detection for nonlinear processes in all ranges of operation. The model is run in 
parallel (multiple-step prediction) and in series-parallel (one-step prediction) to the 
process, which leads to symptoms with different sensitivity to the faults. The suit¬ 
ability of the two models is discussed and in the final fault-detection scheme both 
approaches are combined in order to exploit the advantages of each. The applicabil¬ 
ity of the proposed method is illustrated for an industrial-scale heat-exchanger pilot 
plant, see [10.2], [10.18], 

a) Fuzzy-neuro process models 

In the discussion that follows, MISO (multi-input single-output) processes are con¬ 
sidered. The description is made in the discrete time domain, but can directly be 
transferred to the continuous-time domain. 

A MISO process with m inputs u(z) and one output }'(:) can generally be de¬ 
scribed in the z-domain by the following nonlinear dynamic function 

1 follows Peter Balle, [10.3] 
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y(z) = Inl( u(z)); u(z) = [«i(r), u 2 (z),u m (z)] T (10.70) 

The function /atl(-) is nonlinear and dynamic and therefore contains terms of z~‘ 
with i = 1 ,•••,«. The function /nl (■) is now modelled by a Takagi-Sugeno fuzzy 
model with linear functions in the rule consequent part. The rulebase of the TS model 
consists of M rules in the form of 

Ri : IF < u p i is Pi > AND ••• AND < u pp is P p > THEN 

< y(z) = y t (z) > 

The rule consequent part is a linear dynamic function 


J-'i(z) = Bn M c i(z) q-h B ic u cc (z) - Aj (z) >y(z) 

with B ik {z) = b 0ik + b uk z _1 H-h b nik z~ n (10.71) 

Aj(z) = a u z _1 d-hfl„/z‘" 


Here • • • b n j k are the parameters of the linear regression model of the /th rule 
with respect to input u ck (z), Bjj(z) and A, (z) are polynomials in z (MA filters). The 
operator z is now omitted for the sake of simplicity. The inputs u c = [u c i, u c2 ■ ■ ■ u cc \ 
of the rule consequent part are a subset of all inputs u : u e c u. The inputs of the 
rule premise parts are defined by u p = \u p \, u p \ ■■■ u pp \ and are also a subset of 
the model inputs u; v\ p c u. The inputs u ( define the linear local dynamic behavior 
of the process, while u^, should contain all the variables that influence its nonlinear 
(operating point depending) characteristic (corresponds to z in Section 9.3.3c)). In 
many applications, it is useful or even necessary to choose different input spaces u^ 
and u c for rule premise part and rule consequent part. The fuzzy sets P, are defined 
over the universe of discourse of u /; . In this approach, normalized Gaussian functions 
are applied. With the product operator as f-norm, the model output is calculated as 
the weighted sum over all M rule consequents: 

M 

y(z) = ^2 (Bn(z)ui(z)-\ - hBi m (z)u m (z)—Ai(z)yi(z))<pi(u p , Ci,o ,) ( 10 . 72 ) 

/ = 1 


where (/;,■ (x, ca,-) is the normalized Gaussian weighting function for the /th model 
with center c, and standard deviation a,. 


(pi (X,C j,Oi) = 


fii(u p ,Cj,aj ) 
Eyl i Pj(Pp,Cj,aj) 


(10.73) 


This calculation of the model output leads to a nonlinear interpolation between the 
rule consequents. Figure 10.9 shows a simple numerical example for a static system 
with two inputs (,\'i, x 2 ), three local linear models and three rules of the form 


IF < xi is large > AND < x 2 is small > THEN < y = y\ > 

IF < x 2 is large > THEN < y = y 2 > 

IF < X\ is small > AND < x 2 is small > THEN < y = y 3 > 
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Fig. 10.9. Example of a TS fuzzy model with three membership functions fa and three rules 
with linear regression models in the consequent parts. On the left the membership functions 
and the linear regression models are displayed; on the right the overall model output is shown. 


The advantage of the model is the interpretability of the rule consequences. They 
describe the process in the operating point by the rule premise. 

For construction of the model, different algorithms exist. Here, the LOcal Linear 
MOdel Tree (LOLIMOT) algorithm proposed in [10.20] is applied, see Section 9.3.3. 
The parameters c, and ay are determined by a tree-like construction algorithm. After 
the structure has been determined in an outer loop, the parameters of the rule conse¬ 
quents can be estimated by a weighted linear least-squares algorithm. For alternative 
construction approaches refer, e.g. to [10.24]. 

Below, <pi is used instead of <pi(n p ,Cj , a,) for the sake of simplicity. (10.72) can 
also be written in the form 

y{z) = (e£i Bn(z) fa) t/i(z)4- B im (z)fa)u m (z)- 

(E;=i Mz) <Pi) y(.z) (10.74) 

= Bi(z,Up) u I (z) 4 - h B m (z,u p )u m (z) 

~A(z,u p ) y(z) 

Here, each parameter of the polynomial 5, and A is a nonlinear interpolation be¬ 
tween the parameters of the rule consequent parts. This is similar to a linear function 
with time-variant parameters, Figure 10.10. The superimposed parameters describe 
the dynamic behavior of the process at the actual operating point. Therefore, [10.6] 
refers to this as a dynamic linearization of the fuzzy model. The dynamic lineariza¬ 
tion has some appealing advantages, compared to normal linearization. Applying 
the latter to Takagi-Sugeno fuzzy models results in “bubble” effects, [10.1] which 
are caused by the rule premise part. Dynamic linearization overcomes this disad¬ 
vantages, [10.6], [10.21 ]. The described approach can also be seen as a neural radial 
basis function weighting of a local linear model, as used for the LOLIMOT approach, 
[ 10 . 21 ], 
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««(*) 



Fig. 10.10. Scheme for dynamic linearization of the TS fuzzy model or local linear neuronal 
model. u c : inputs for local linear models, u p : operating point dependent behavior. 


b) Fault-detection scheme for sensor faults 

Generation of structured parity equations with TS models 

As shown above, a linearized model with varying parameters can be derived from 
the TS fuzzy model using (10.73). These linearized models are now used for the 
generation of structured residuals. Assume a nonlinear MIMO system with r outputs 
Vi.../-, and m inputs of order n 

yOfe) = fNL(u(k),y(k )) 

y (k) = [vi (k), y 2 {k) . y r (k)] T (10.75) 

u(Ar) = [u\(k), u 2 (k) . u m (k)] 

For each output jy (k) of /nl(-) a local linear model in the form of (10.71) can be 
identified, which generally depends on all the inputs and outputs of the process. The 
model can then be described in the form 

A n (z) ji(z) = B n {z) u i(z) + ••• + A lr (z) v r (z ) + (1076) 

A rr (z) y r (z ) = B r \ (z) u,(z) + - h A r(r _ l} (z) j r _i(-) + c ( Q r) 

A{j(z ) and Bjj(z) are moving average (MA) filters with parameters dfj , b^ and 

Co ^ which depend on u^, similarly to (10.74). (10.76) can also be written in a vector 
form 


r(r) = Ai(z) y\(z) H-hA r y r (z) 

-Bi(r) «!(z)- B m (z)u m (z) - c 0 « 0 

with the residuals vector r(r) and the vectors 


(10.77) 


A;(z) = [Au{z)--- A ri (z)] T , B/(z) = [Bu(z)---Bri(z)] T , 
r(z) = [ri ] T 


(10.78) 
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The residuals of (10.77) depend on all the inputs and outputs of the system. In or¬ 
der to generate structured residuals , (10.77) has to be transformed to decouple the 
residuals from the different input and output measurements. Therefore, both sides of 

(10.77) are multiplied with a residual generator w(z) : 

rf{z) = w T (z) r(z) = w r (z) (A,(z) jq(z) (10 79) 

H-- H//J (- ) tl m (z ) Co) 

w(z) is a vector of length r with MA filters Wjj (z) as elements. As can be seen from 

(10.77) , choosing w(z) to satisfy the condition 

w r (z) B, (z) = 0 or w r (z)A,(z) = 0 (10.80) 


leads to residuals which are decoupled from input «,■ or from output ry, respectively. 
Note that it is always possible to find a w T {:) that satisfies (10.80), but sometimes 
the residual generators for different inputs and outputs are equal, and no isolation 
between these faults is possible. The single residual generator vectors build the rows 
of a residual generator matrix W(z), and (10.79) becomes 


r*(z) = W(z) (A,-(z) >>, (z) 4-B,„(z) u m (z) - c 0 

f W\ i (z) ••• Wir(z) 1 


W(z) = 


W lr (z) 


Wrr(z) 


(10.81) 


As mentioned above, the vectors A,- and B, depend on the rule premise inputs \x p . As 
the same holds for the elements of the residual generator, the residual generator has 
to be calculated online for each sample interval. The residual generation is shown in 
Figure 10.11 for both structures. 




Fig. 10.11. Parallel and series parallel model, and output residuals: (a) series parallel model; 
(b) parallel model 


Parallel and series parallel model 

There are two principal ways of modelling dynamic systems with an auto-regressive 
part. The first is the series-parallel model (also called the equation error model) 
which uses previous process outputs y(/c — d) for the prediction of the next process 
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output. The second is the parallel model (or output error model), where previous 
model outputs are fedback, Figure 10.11. Due to the different structure, the derived 
residuals show different behavior, [10.13]. In the series-parallel residual, measured 
noisy output signals are used instead of noise-free estimated ones, and hence, their 
signal to noise ratio is smaller. Therefore, series-parallel residuals have often to be 
low-pass filtered to separate the residual deviation caused by a fault from that caused 
by noise. On the other hand, parallel models are likely to drift, due to model uncer¬ 
tainties and the auto-regressive part. Thus, the performance of the parallel model is 
worse and the model can also become unstable, [10.22]. The pros and cons are as 
follows: 

Parallel-model parity equations 

+ even smaller faults can lead to higher residual deflections; 

+ signal-to-noise is relatively high; 

+ no filter is needed; 

- model is difficult to build, and its performance is worse; 

- models are likely to drift, due to unprecise modeling; 

- generated parity equations have an auto-regressive part. 

Series-parallel-model parity equations 

+ lead to high peaks, when sudden changes occur; 

+ very fast response, but only small deflection; 

+ no auto-regressive part in the parity equation; 

+ no drift effects —»• output error is smaller ; 

- possess only a small signal-to-noise ratio; 

- need for low-pass filtering. 

In order to improve the performance of the overall fault-detection scheme, both ap¬ 
proaches can be combined. The sensitivity is computed, and the most sensitivity 
residuals are selected. An application for the detection of sensor faults of a nonlin¬ 
ear heat exchanger is described in [10.2] and [10.18] The application of neuro-fuzzy 
models with local linear models and operating point dependent parameters for the 
fault detection of a pump-pipe system is shown in [10.26] and [10.27], 


10.5 Parameter estimation with parity equations 

The consideration of the sensitivity of the residuals of parity equations in Section 
10.3.3 has shown that the residuals in the case of a small change of a parameter pj 
yield 

r(t) = (P V ( 0 A Pj = -Pj(t ) A pj (10.82) 

d P j 1 

with the scalar (residual sensitivity) 
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Pj(t) = ^ tU) (10.83) 

dpj 

which follows from the process model. The parameter changes A pj can be deter¬ 
mined by a least squares estimation. Therefore (10.82) is written as 

r(k) = -Pj(k)Apj + e(k ) 

where e(k) is an equation error and k = t / 7q = 0, 1,2,... this discrete sampling 
time. By using N measured samples 

r(A:) = [/■ (k — 1) r(k — 2) ■■■ r(k — AO] (10.84) 

Pj (k) = [pj {k- 1) Pj (k - 2) • • • Pj (k - A0] (10.85) 

the nonrecursive least squares estimate is according to Section 9.2.1 

A Pjik) = -[pliVpjikT'pJikMk) = ~ ^ v= lf/ k ~ V t )r(k ~ V) (10-86) 

Ev=i Pj(k-v) 

This estimation performs a correlation of the residual sensitivity Pj to the parameter 
pj with the residual r. 

A corresponding recursive parameter estimation with exponential forgetting fol¬ 
lows according to (9.98), (9.99) and (9.100) 

A pj(k + 1) = A pj(k) + y(k)[r(k + 1) + Pj (k + 1)A pj(k)] (10.87) 

y(k) = -- P(k)Pj(k + 1) (10.88) 

P(k)p 2 j(k + 1) + k J 

P(k + 1) = [1 - y(k)Pj(k + 1 )\P(k) 1 - (10.89) 

with X < 1.0 and starting value P( 0) = a (a > 1000). 

For each parity equation with one residual t'j(k) therefore one parameter devia¬ 
tion A Pj(k) can be estimated. This may be used for adapting one parameter of the 
fault-detection procedure with parity equations if one knows that these single para¬ 
meters are time varying due to normal operating conditions. Flowever, if faults are 
expected which change parameters, then direct parameter estimation of all process 
parameters should be preferred. 


Example 10.4: Adaptive parity equations for a DC motor 

The residual for the electrical part of the DC motor is, see Example 10.3, 

f 4 (t) = e T (p)i/r(t) = U A (r) - La i A (t) - R A I A (t ) - 'p 0 )(t ) 
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with 

e T (p) = [1 L a R a ¥] 
f T (t ) = [U A (t) - IaU) - IA{t) - w{t)\ 

As the armature current depends strongly on the temperature, this parameter is time 
varying, depending on the load and cooling of the motor. Therefore Ra will be esti¬ 
mated. (10.83) leads to 


o Ra 


-1 Ait) 


The estimation equation then follows (10.86) 


T/vL 1 lA(k - y)r 4 (k - v) 

Ev=i i\( k - v ) 


A R A (k) 


Correspondingly one obtains from residual r\ for the friction parameter 


a>(k ~ v)r\{k - v) 
Eti cvHk-v) 


A M f = 


see [10.14], 

Figure 10.12 shows a simulation run for a DC motor with Ua = 25 V = 
const, Mi oa d = 2Nm = const and with an increase of the armature resistance from 
Ra = 1.53 E2 to R a = 1.72 52 due to an increase of temperature during the opera¬ 
tion within the first 50 min run-time. Without adapting the resistance parameter the 
residual exceeds a threshold after about 10 min. With parameter adaptation of Ra in 
the residual equation all 5 min the residual is set back to zero and a wrong alarm is 
avoided, see also the experimental investigations in Chapter 20. 

□ 


10.6 Problems 

1) Derive the equations for the primary residuals of parity equations with output 
error and equation error in the form of differential equations for a linear second 
order mass-spring-damper system of Example 5.5. Which measures have to be 
made to come to realizable solutions? 

2) Solve the same tasks as in 1), but with discrete-time signals, r-transfer functions 
and difference equations. Compare realizability with 1). 

3) What are the advantages of structured residuals compared to primary residuals? 

4) What is the difference between the evaluation form and computational form of 
parity equations? Write the corresponding equations for transfer functions. 
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Fig. 10.12. Simulated residual r 4 (t) of the DC motor for increasing armature resistance (?) 
due to warming-up without and with adaptation (according to I. Unger) 

5) An electrical drive system for electrical steering of vehicles with a gear in series 


connection is given with simplified 

1 transfer functions: 
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The DC current I(t), the speed a>(t) and the position y(t) of a rack can be 
measured. The task consists in the fault detection of sensor faults of all three 
signals and faults in bearings and windings of the DC motor. 

a) Derive equations for structured residuals. 

b) Show the effect of offset sensor faults in a fault-symptom table, if no process 
faults happen. 

c) Are the sensor faults strongly or weakly isolable? 

d) How do the residuals for the process react, i.e. a change of Fj and K g ? Add 
the fault-symptom table. Which faults can be diagnosed? 

e) Is a slowly drifting fault of the position sensor detectable? 
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Fault detection with state observers and state estimation 


As state observers use an output error between a measured process output and an 
adjustable model output, they are a further alternative for model-based fault detec¬ 
tion. It is assumed, as in the case of parity equation approaches, that the structure 
and the parameters of the model are precisely known. State observers adjust the state 
variables according to initial conditions and to the time course of the measured input 
and output signals. 

Several approaches have been proposed for fault detection which are based on the 
classical Luenberger state observer, Kalman filter and the so-called output observer. 
Some of the basic methods are treated in this chapter. 


11.1 State observers 

A linear-time invariant process is considered which can be described by the state- 
space model 

x(() = Ax(i) + B u(/) (11.1) 

y(0 = Cx(0 (11.2) 

Here p input signals u(t) and r output signals y(t) are assumed because the fault- 
detection methods are especially suitable for multi-variable processes. With the as¬ 
sumption that the structure and the parameters of the model are known, a state ob¬ 
server is used to reconstruct the unmeasurable state variable based on measured in¬ 
puts and outputs 

x(t) = A x(t) + B u(f) + H e(?) (11-3) 

e(t) = y(t)-Cx(t) (11.4) 

compare Figure 11.1. e(f) is an output error which acts through the observer matrix 
H on the reconstructed state derivatives x(t). Inserting (11.4) in (11.3) yields the 
implementation form of the state observer 

x(t) = [A — H €] x(t) + B u(f) + H y(f) 
where it is assumed that the system is observable. 


(11.5) 
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process 



Fig. 11.1. Process and state observer 


The state error 

x(t) = x(f) — x(t) (11.6) 

between the real process states and the observed states becomes under the assumption 
that process and model parameters are identical and by introducing (11.1) and (11.5) 

x(i) = [A - H €] x(?) (11.7) 

Hence, the state error vanishes asymptotically 

lim x(?) = 0 

t—^oo 

for any initial state deviation [x(0) — x(0)] if the observer is stable, which can be 
reached by proper design of the observer feedback matrix H, e.g. by pole placement. 

11.1.1 Additive faults 

The process is now influenced by unmeasurable disturbances \(t) and n(/) and ad¬ 
ditive faults f/(r) and f m (t) as follows, compare Figure 11.2, 

x(t) = Ax(() + B u(t) + Vv(?) + L f i{t) (11.8) 

y(0 = Cx(0 + Nn(0 + Mf m (0 (11.9) 

L and M are fault entry matrices. Introducing this process equations into the observer 
equation according to (11.5) leads to the state error 

x(f) = [A-HC] x(f) + Vv(f) + Lf/(0-HNn(0-HMf m (0 (11.10) 


and the output error becomes 
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Fig. 11.2. Multi-variable process with disturbances v, n and fault signals f/, f m 


U 

process input vector [/; x 1] 

V 

input disturbance vector [m x 1] 

X 

process state vector [in x m] 

f / 

additive input fault vector [m x 1] 

y 

n 

process output vector [r x 1] 
output disturbance vector [1 x r] 

fm 

additive output fault vector [r x 1] 


e(t) = y (t) — C x{t) = C x(t) + N n(t) + M f„,(?) (11.11) 

After initial state deviations [x(0) — x(0)] are asymptotically vanished, the state error 
x and the output error e(/) depend on the disturbances \(t) and n(?) and the faults 
f i(t) and f m (t). x can be used as residuals if primary faults f'/(f) on the states (as for 
leak detection) are of interest. In general, however, the output error e(t) = r(?) will 
be used as residual. The residual is zero, if no disturbances and faults are present 
and it deviates from zero, if faults f'/(f) or f m (t) appear (and also if n(/) ^ 0 and 
\(t) 7 ^ 0). It is interesting to see that the residuals do not depend on the input signal 

u( 0 - 

To obtain the input-output relation of the state observer (11.5) is Laplace trans¬ 
formed 

[s I — A + H €] x(.y) = B uCs) + H y(.y) (11.12) 

which leads to 

x(.y) = [s I - A + H C ] -1 [B u(.sj + H y(,y)] (11.13) 

Introduction into the output error residual (11.4) yields 

r(.y) = e(y) = -C [s I - A + H C]” 1 B u(,y) + [i — C [si — A + H C ] -1 h] y(,y) 

(11.14) 

This is the Laplace transformed computational form of the residuals for state ob¬ 
servers. As (11.10) and (11.11) show the influence of the faults in the residuals x(t) 
or e(f), they correspond to the internal form of parity equations, compare ( 10 . 8 ). 
Applying the Laplace transform to the state observer equations with additive faults 
( 11 . 10 ) and ( 11 . 11 ) and omitting the disturbance terms leads to 

r(s ) = e(.y) = C [s I — (A — H C )] -1 [L f/ (s) — H M f m (.y)] + M f m (,y) (11.15) 
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Additive faults f / and f„, act on the output error e according to the observer dynamics 
[.v I — (A — H C)] , whereas f,„ acts directly on e in addition. The static deviation 

for a step-change f/o and f,„o becomes 

lim e(0 = e<> = 0) = C [H € - A] -1 [L f, 0 -HM f m0 ] + M f m0 (11.16) 

t—>oo 

Therefore the output error shows a constant remaining offset for stepwise faults f/ 
and f,„ at the input and output of the process. Also the state variable error x shows a 
constant offset, see (11.10). 

A comparison of the output error residual of the state observer, (11.14), with the 
output error residual of the parity equation approach (10.7) shows similarities. In¬ 
stead of the process transfer function G p (s ) the observer dynamics [.vl — (A — H C)] 1 
act between additive input faults and the output error. 

11.1.2 Multiplicative faults 

If faults appear as changes in the parameter matrices A A, AB or AC, the process 


behavior becomes (after the transients have been settled) 

x(t) = [A + A A]x(?) + [B + A B]u(f) (11.17) 

y(0 = [C + AC]x(0 (11.18) 

and the state and output error without disturbances 

x(t) = [A-H C] x(t) + [AA-H A C]x(f) + A Bu (t) (11.19) 

e(i) = C x(0 + A C x(f) (11.20) 


Hence, the state and output error depend on the parameter changes multiplied with 
the input signal u(f) and the state variables x(f). Therefore the analysis of the behav¬ 
ior is not as straightforward as for additive faults. 

11.1.3 Fault isolation with state observers 

If the output error e(/) = r(t) is used as primary residual , its elements will deflect 
depending on the faults as shown, e.g. by (11.15). In order to determine the single 
faults, enhanced residuals have to be generated, as in the case of parity equations, 
Section 10.3. This can be reached by fault-detection filters or by dedicated observers , 
mostly in form of a bank of observers. Note that these approaches require multiple 
process outputs. 
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Fault-detection filters for multi-output processes (fault-sensitive observers) 

There is some freedom in the design of the observer feedback matrix H. Usually the 
poles Sj of of the characteristic equation of the observer 

det [5 I — A + H C] = (s — .s 1 )(.S’ — S2) ■ ■ ■ (s — s m ) = 0 (11.21) 

are selected such, that they lie in the left half s-plane in order to guarantee a stable 
behavior and that they are faster and better damped than the process poles. This 
means, they will be shifted to the left as compared to the process. However, if the 
gains become too large the output noise n(t) will be amplified too much. Therefore 
a proper compromise has to be found. 

Now the observer feedback matrix H can also be used to give the residuals r 
a proper structure for fault isolation, as proposed by [11.3] and [11.16] as a fault 
sensitive observer. The considered state equation is 

x=Ax+Bu+l i fu (11.22) 

For each fault fu a fault influence vector I, is determined, such that in linear inde¬ 
pendent vectors 1, result. As residual the output error 

r = e = y — y (11.23) 

is used. It is now assumed that the system is completely measurable. Therefore from 
the output equation 

y=Cx Rank C = m (11.24) 

it follows 

x = C~* y (11.25) 

(which means that all states are measurable). Introducing (11.25) in (11.22) leads to 

y= CAT 1 y + CBu+Cl, f u (11.26) 

For the observer holds 

x = Ax + Bu + Hr (11.27) 

and with (11.25) 

| = C A C _1 y+CBu+CHr (11.28) 

where the states are now eliminated. 

Furtheron the residual (11.23) becomes with (11.26), (11.28) 

r=[cAC"'-CH] r+Cl/Z/i (11.29) 

To decouple the residuals from each other a diagonal matrix with fast equal eigen¬ 
values X is introduced 

[c A C -1 - C h] = A I 


(11.30) 
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Laplace transformation of (11.29) leads to 

r(.v) = [s l-X I]" 1 C 1; fu = C 1 ; (11.31) 

S ~ A 

The searched observer feedback matrix H now follows from (11.30) 

H = [A — LI] C _1 (11.32) 

If the states are not measurable some transformations of the model have to be per¬ 
formed, [11.3], [11.16], An example is given in [11.12], 

A further procedure is given by [11.21] and [11.30]. 

Bank of observers (dedicated observers): 

Another possibility to distinguish between different faults, is to use special observers 
with different inputs and outputs. The observer with the missing inputs or outputs 
then do not show changes for corresponding input and output faults. The following 
schemes are described for additive sensor faults 

y = C x + M f,„ (11.33) 

with 

f l = [fnl fm2 ■ ■ ■ fmr] (11.34) 

A further observer bank is obtained by using for all observers all inputs u but only 
one output, compare Figure 11.3a, [11.7] 

y,-=c fx+f mi (11.35) 

and the observer equation (11.5) results in 

x = j^A — h, cfj x + B u + h, yi (11.36) 

leading to a residual as output error 


t'i = yi — c r x (11.37) 

Therefore, only the A-th output error or residual is affected by a fault and all 
other residuals r,, i = 1 ... r\ i ^ k should be zero. Thus by analyzing the patterns 
of the residuals the additive sensor fault can be isolated. 

Another possibility is to use for the observers all outputs y but only one input 
[11.31], [11.26], Figure 11.3b.This scheme is designed to detect input faults //,-. 

The bank of observers scheme can be expanded to all inputs and all outputs, 
except one input or one output for which the fault is modelled. Then the observability 
or controllability is improved. This is called the generalized observer scheme, [11.9]. 
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Fig. 11.3. Bank of observers (dedicated observers): (a) all inputs, one output; (b) one input, all 
outputs 

11.2 State estimation (Kalman filter) 

For the linear multi-input multi-output process with discrete time signals and without 
stochastic disturbances 

x(k + 1) = A x(k) + B u(A') (11.38) 

y(/c) = C x(k) (11.39) 

the state observer equation becomes, corresponding to (11.3), (11.4), 

x(A' + 1) = A x(k) + B u(Ar) + H [y(A') — C x(A:)] (11.40) 

The equation error as an output error then is 

e(k) = y(k) - C x(k) (11.41) 

and the state error equation according to (11.7) becomes 

x(k + 1) = [A-HC]x(A) (11.42) 

If no disturbances act on the process, the observer converges to the true state variables 
if the eigenvalues A—H C are asymptotically stable. The speed of convergence can be 
made fast by a large influence of the observer gain H. However, under the influence 
of stochastic disturbances the state reconstruction with these observers is not optimal. 
The state reconstruction must then simultaneously follow the true state variables and 
reject the noise effects which then leads to an estimation problem. 

The process is now supplemented by stochastic noise v(A') at the input and n(A') 
at the output 


x(A- + 1) = A x(At) + B u(Ar) + Y v(Ar) 
y(A') = € x(Ar) + n(k) 


(11.43) 

(11.44) 
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The process matrices A, B, C and V are assumed to be known. The initial state x(0) 
is not known, but probabilistic information is known about x(0) and also about v(/c) 
and n(k). These stochastic variables are assumed to be statistically independent, have 
a normal (Gaussian) distribution with the mean values 

E {x(0)} = x 0 ; E {v(/c)J = 0 ; E {n(Ar)} = 0 (11.45) 

and the covariance matrices 

E {(x(0) - x o )(x(0) - x 0 ) r } = X 0 

E{\{k)v r (k)} = M (11.46) 

E (n(A') n r (A:)} =N 

Furtheron, it is assumed that M and N are known in order to have a measure about 
the size of the noises. 

As the state estimation error cannot converge to zero, a best estimate has to be 
found for the state vector x(Ar) based on the measured input variables u(Ar) and output 
variables y (k). A least squares estimation then requires 

min 11 x(k ) - x(k \ j ) 11 2 (11.47) 

Two different time instances are used here, k means the present time and j the used 
time instant of the measurements. The state estimation can then be given different 
names, [11.23]; 

k > j prediction problem; 
k = j filtering problem; 
k < j smoothing problem. 

The filtering problem and one-step ahead prediction is further considered. The used 
measurements of the output are 


Y j = 

{y(0), y(l).y0*)} 

(11.48) 

Following notations are used 

Optimal estimates: 

x(k\j) = E {x(/:)|Y;} 

(11.49) 

Estimation error: 

x{k\j) = x(k) x(k\j) 

Covariance matrices of the estimation error: 

(11.50) 

P ~{k + 1) = E | 

|x(/r + 1 \k) x T (k + 1 A:)j 

(11-51) 

P(k + 1) = E | 

|x(/r + 1 |k + 1) x T (k + 1 |k + 1)| 

(11.52) 


For time instant k + 1 the state variable x(k +1) can be predicted by using the state 
model (11.43) with the information at time k 
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x(k + 11At) = A x(k\k) + B u(/c) + V v (11.53) 

as the exact v(/c) is unknown. 

With the assumption E J v(/c) j = v = 0 it yields 

x(k + \ \k) = A x{k\k) + B u(/c) (11.54) 

At time k + 1 also the measurement of the output y(Ar + 1) is available. It holds 

y (k + 1) = C x(k + 1) + n (k + 1) (11.55) 

However, x(k + 1) is unknown. The prediction x(k + 11 k ) is disturbed by the noise 
y(k) and the measurable output y (k + 1) for x(k + 1) by n(k + 1). It is assumed the 
\(k) and n(k) are statistically independent. 

If both, x(k + 1 \k) and x(k +1) would be known, one could calculate a weighted 
mean as an estimate. 

x(k + 1 \k + 1) = (I - K') x(k + 11 k) + K' x(k + 1) 

= x(k + 1| k) + K'[x(k + l)-x(k + l|/c)] ( ' 

where K' is a (m xm) weighting matrix which is to be chosen such that the covariance 
of the estimation error P(A: — 1) becomes a minimum. Now, instead of x(k + 1) the 
measurable output vector y(k + 1) is used. Then with K 7 = K C holds 

x(k + \\k + 1) = x(A: + 11 A:) + KC[x(A: + 1) — x(k + l|Ar)] 

= x(A: + 11 A:) + K[y(A: + 1) — Cx(Ar + l|Ar)] (11.57) 
= [I — KC] x(k + 1 \k) + K y(k + 1) 

This equation contains: 

x(k + \\k) : the model prediction of x(k + 1) based on 
the last estimate x(k\k), (11.54), 
y(k + 1) : the new measurement 

A recursive estimation algorithm then follows from (11.57) 

x(k + \\k + 1) = x(A: + 1| k) + K (k + l)[y(A: + 1) — C x(k + l|k)] (11.58) 

where the correction matrix K(k + 1) has to be chosen as to minimize the covariance 
matrix of the estimation error. As this variance changes with time, also K(A:+1) must 
be time-variant. 

The error of the prediction is 

x(k + 1| k) = x(k + 1| k) — E {x(k + l|fc)} (11.59) 

and the error in the measurement, see (11.44) 


y(k + 1) = y(k + 1) - E {y(k + 1)} = n(/c) 


(11.60) 
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The corresponding covariance matrices are 

P~(k + 1) = E |x(/t + \\k) x T (k + 1|A:)| (11.61) 

Y = E jy(£ + 1) y T (k + 1)| = E {n (k) n r (A)} = N (11.62) 

The covariance matrix of the recursive estimate x(k + 1 \k + 1) then becomes with 
(11.57) 


P(A: + 1) = E jx(A + 11A: + 1) x T (k + \\k + 1)| 

= £{[(I-K(A + 1) C)(x(k + \ \k) - E{x(k + l\k)}) + K (k + l)(y (k + 1) 

-E{y(k + 1)}] [(I - K(k + 1) C)(x(k + 1|k)) (H-63) 

—E{x(k + 1| k)} + K (k + l)(y(k + 1)) - E{y(k + l)}] 3 "} 

= [I - K(A- + 1)C] P ~(k + I ) [I — K(/c + l)C] r + K(k + 1)N K T (k + 1) 

Now a value of K(/c + 1) is sought which minimizes the variance of the covari¬ 
ance of the estimation error. To find this minimum without differentiation, (11.63) is 
modified. As shown in [11.1] and [11.14] it can be formed into a complete square in 
K(A: + 1) which after several matrix calculations leads to 

K (k + 1) = P ~(k + 1)C T [C P~(k + l)C T + ND 1 (11.64) 

and 

P(A- + 1) = P ~(k + 1) - K(k + 1)C P~(k + 1) (11.65) 

Herewith P _ follows from (11.51) and (11.59) with 
E {x(k + 1|A)} = E {x(Ar + 1)} 

P~(k + 1) = E{(x(k + 11A) - E {x(k + l)})(x(A + \\k) 

-E{x(k + m T } ( 1L66) 

= A P(/c) A t + V M \ T 

with covariance matrix P(/c) of the estimation error x(k\k), according to (11.52). 
Hence, the sequence of calculations is 

Prediction : (from (11.53) and (11.66)) 

x(A: + 11 A') = A x(k\k) + B u(Ar) (11.67) 

P~(A- + 1) = AP(A)A r + VM V T (11.68) 

Correction : (from (11.64), (11.58) and (11.65)) 

K (k + 1) = P~(k + l)C r [C P~(k + 1) C T + N]" 1 (11.69) 

x(k + \\k + 1) = x(k + 1| k) + K (k + l)[y(A + 1) 

—C x(k + l\k)] 

P(k + 1) = [I - K (k + 1)C]P“ (k + 1) 


(11.70) 

(11.71) 
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If the prediction (11.67) is inserted in (11.70) it follows 

x(k + \\k + 1) = A x(k\k) + B u(/c) 
new estimate old estimate 

+ K(k + 1) [y(/c + 1) - C(A x(k\k) + Bu(fc))] 

correction new measure— predicted measurement (11.72) 

matrix ment based on old estimate 

These filtering equations are called the Kalman filter. 

The state estimation is a recursive estimation of the state x(k + \\k + 1) based 
on a predicted state x(k + \\k) by the process model and a correction based on the 
new measurement y(k + 1). A comparison with the state observer (11.40) shows 
that the observer only uses past information x(k) and y(k) and not predicted values 
x(k + 1| k) and the new measurement y(k + 1). 

The correction matrix or gain K(A: + 1) depends on the covariance matrices M 
of the state noise v(Ar) and N of the output noise. It can be computed in advance, as 
it does not depend on measured signals. 

If the process matrices A, B and C and the noise covariance matrices do not 
depend on time, the Kalman filter gain K(k + 1) approaches asymptotically a steady 
state value K. The steady state estimation error covariance matrix P follows from 
the elimination of P(Ar) from (11.68) using (11.71) and (11.69) leading to 

P~(A:+1) = A P~(k) A r —AP“(Ar)C r [C P ~(k) C^+Np 1 CP“(it)A r +YMV r 

(11.73) 

which is a Riccati equation. (One recognizes the duality to the optimal state control). 
Its asymptotic solution results in the steady state matrix P~. Then the steady state 
Kalman filter gain becomes 

K = P” C r [C P“ C T + N]^ 1 (11.74) 

The calculations then reduce to 
prediction: 

x(k + 1| k) = A x(k\k) + B u(/c) (11.75) 

correction: 

x(k + \\k + 1) = x(k + \\k) + K[y(A: + 1) — C x(k + 11A:)] (11.76) 

To obtain a comparison to a state observer the previous correction (11.76) 

x(k\k) = x(k\k — 1) + K[y(/f) — C x(k\k — 1)] (11.77) 

is inserted in the prediction (11.75), leading to 


x(A: + 1| k) = A x(k\k — 1) + B u(/c) + A K[y(Ar) — C x(k\k — 1)] (11.78) 
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A comparison with the observer (11.40) shows that if the observer gain is chosen as 

H = A K (11.79) 


the observer equation corresponds to the Kalman filter. 

Figure 11.4 shows a signal flow diagram of the Kalman filter by using (11.77) 
and (11.75). (This scheme, as well as (11.77) and (11.78), is depicted for the case 
that at time instant k the input signal u(k) and the output signal y(k) is available, as 
for an observer. If, however, the next output sample y(k + 1) is used for estimating 
x(A: + \\k + 1) the correction follows (11.76)). 



Fig. 11.4. Signal flow diagram of a Kalman filter with the prediction according to (11.75) and 
the correction according (11.77) 


In the original work of [11.17] the recursive state estimator was derived by apply¬ 
ing the orthogonality condition between the estimation errors and the measurements 

E jx(t) Y t O')} = 0 for j < i (11.80) 

An alternative derivation of the Kalman filter follows using the conditional expecta¬ 
tion of a least squares estimate 

E{x(k)\Yj} (11.81) 

see (11.47) and (11.48). Hence, the estimate is the vertical projection of x(k) on Y j 

x(k\j) = E {x(k)\Yj) (11.82) 


with the estimation error 
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x(k\j) = x(k)-x{k\j) 

see, e.g. [11.23]. Other references for the Kalman filter are [ 11.4], [11.10], [11.15], 
[11.2], [11.19], [11.28], [11.29], 

An early application of a Kalman filter for fault detection, excited by the outputs 
of a multi-output process is shown by [11.22]. The residuum (innovation) changes 
the character of zero mean white noise with known covariance, if a fault appears. See 
also [11.31]. In principle the application of the Kalman filter is similar to that of state 
observers. But it is especially suitable for processes with relative large state variable 
and output noise. 


11.3 Output observers 

The classical state observer or Kalman filter primarily reconstruct or estimate the 
state variables. However, if faults in the state variables are not of interest, so called 
output observers or functional observers can be applied for fault detection. The goal 
is to reconstruct the output by a state space model which only generates residuals 
for faults f(f), but not for the measurable input u(f) and nonmeasurable (unknown) 
inputs \(t). There exists a rich literature on unknown output observers, see, e.g. 
[11.27], [11.6], [11.5]. The following derivation follows [11.8] and [11.13]. 

The process is described by 

x(t) = A x(t) + B u(f) + V v(t) + L f (t) (11.83) 

y(0 = C x(t) + M f(t) (11.84) 

The goal is now to generate residuals which are independent of the unknown inputs 
\(t). Therefore a linear transformation 

$(0 = T lX (0 (11.85) 

is applied to build an observer with new state variables £( t ). The output of the ob¬ 
server be i](t). Therefore no direct output error with the process output y(t) results, 
but an output error with a transformation 

»/(0 = T 2 y(0 (11.86) 

The transformed process model without fault and disturbance influences then be¬ 
comes 

$(0 = Af$(0 + B t u(0 (11.87) 

1/(0 = Cj §(0 ( 11 . 88 ) 

and the corresponding state observer, compare Figure 11.5 

k (0 = Af | (0 + B| u(0 + H| y(0 


(11.89) 
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$(0 = C f |(/) (11.90) 

This observer then does not feed back an error signal of the outputs, but has the 
character of a parallel process model. The error of the states is 

1(0 = 1(0 — Ti x(0 (11.91) 

Inserting (11.83) and (11.85) leads to 

l = 1(0-T x x(t) 

= Af |(f) + (A| T] + H| € — Tr A) x(/) + (B s - T, B) u(0 
-Tj Yv(0+ (H| M-T, L)f(0 (11.92) 

and the residual becomes with (11.84) 

r(0 = m - 9(0 = Cf 1(0 - T 2 y(0 (11-93) 

= C| 1(0 + (C f T, - T 2 C) x(0 + T 2 M f(0 (11.94) 



To decouple now the state error £ (/) and the residual r(0 from the unknown state 
x(0 and the unknown disturbance v(0 and from u(0, following equations must be 
satisfied 
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T! A — Af T! = H| C (11.95) 

B f = Tj B (11.96) 

T!V = 0 (11.97) 

Ti — T 2 C = 0 (11.98) 


In addition, the observer matrix Af is selected as a diagonal matrix with stable poles. 
The set of equations (11.95) to (11.98) can be solved by transforming the state equa¬ 
tions into Kronecker canonical form, [11.9], by using an eigen-structure assignment 
for the observer, [11.5], or by an iterative procedure based on singular value decom¬ 
position, [11.18]. [11.24] gave arelative simple solution, see also [11.13]. 

The state error finally becomes 

|(0=A f i(0 + (H { M-T 1 L) f(t) (11.99) 

and the residual 

r = |(0 + T 2 Mf(0 (11.100) 

If the state errors asymptotically approach zero, the residual only depends on the 
faults f(f) and not on the unknown input \'(t). 

Laplace transformation of (11.89) and (11.93) yields 

r(s) = C t I(s) - T 2 y(,y) = [c f (s I - A) -1 Hj - T 2 ] (n 1Q1) 
y(s) + Cl [s I - A] -1 Bf u(s) 

This is, as for parity equations, the computational form of the residuals by measuring 
u(f) and y (t). After inserting (11.99) in (11.100) one obtains the internal form 

r(y) = Cj (si — Af)~' (H t M - T, L) f(,y) + T 2 M f(,y) (11.102) 

Here the dynamic response of faults on the residual can be seen. This shows that in 
addition to the above requirements 

C f (H f M-Tj L)+T 2 M^0 (11.103) 


also must be satisfied. 

A comparison of (11.102) with the output error residual for parity equations 
(10.8) indicates similarities. The faults f(.y) are just filtered by another transfer func¬ 
tion, here however, independent on input disturbances v(t). 

The design of this output observer requires that the unknown disturbance entry 
matrix V is known precisely because it determines the transformation matrix T |. The 
other conditions (11.95), (11.96) and (11.99) let also suppose that the process model 
(A, B, C) must be accurately known. The output observer approach gives relative 
much freedom for the design, but on cost of design effort and transparency. 
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11.4 Comparison of the parity- and observer-based approaches 

It was already mentioned that similarities exist between the different approaches for 
fault detection with parity equations and observers. They all use the same measurable 
input signals u(?) and output signals y(t), assuming that the structure and parameters 
of the process are exactly known and do not change (fixed model) and if unknown 
inputs \(t) have to be compensated, that the disturbance entry matrix Y is known. 

11.4.1 Comparison of residual equations 

The resulting computational form of the residual equations for continuous time are 
in the Laplace domain without noise terms: 

1) Parity equations with transfer functions. 

• output error, (10.7): 

r'fv) = y (s) - G m (s) u(,v) 

• equation error, (10.11): 

r(.v) = A,„ v(.v) - B„,(.v) u(.v) 
with structured residuals, (10.52): 

r*(-s) = W(,y) [A m (.y) y(,y) - B,„(,y) u(,y)] 

2) Parity equations with state space models, (10.29): 

r (-y) = W [L v (s) y(s) - Q„ L„(,v) u(.y)] 

3) State observer, (11.14): 

r(,y) =[l-C[sI-A + H C] _1 h] y(,y) - C [s I - A + H C]" 1 B u(.y) 

4) Output observer, (11.101): 

r(.y) = [t 2 - Cf [.s I - A]- 1 Hj] y(.y) - C f [si- A” 1 ] B t u(.y) 

Hence, the structure of the residual equations is very similar. They differ in the way 
the input and output measurements are filtered. 

Several authors have compared the various methods for residual generation. In 
the case of discrete-time realization, it was shown in [11.9] that the unknown in¬ 
put observers and parity equation approach are equivalent if the observer is de¬ 
signed with a deadbeat behavior, see also [11.5]. [11.20] summarize some results 
and show equivalence between Luenberger state observers and parity equations, see 
also [11.11]. [11.13] has compared a state observer and a reduced order observer 
with unknown input and parity equations based on a state space model for a DC 
motor and shown theoretically and by experiments that all three approaches lead to 
the same residual deflection for a fault in the armature resistance, see also the next 
section. 
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11.4.2 Comparison by simulations 1 

To compare the resulting residuals of different fault-detection methods based on par¬ 
ity equations and observers by simulations a linear model of a DC motor is consid¬ 
ered, see Figure 11.6. As described in Chapter 20 the (simplified) dynamic model 
results from the armature circuit equation 

l a i A (t) + r a i A { t) + v At) = u A {t) (n.104) 

and the equation for the mechanical part 

J 6){t) = V IAt)- M P m(t)~ M L (t) (11.105) 


where 


U A armature voltage (input) 

I A armature current 
cd angular speed (output) 

Mp load torque (disturbance) 
Mp viscous friction coefficient 
L a armature inductance 
R a armature resistance 
fit flux linkage 


The state space formulation becomes 



Fig. 11.6. Signal flow diagram of a permanently excited DC motor 
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For the generation of parity equations the transfer function 


(11.106) 


Ci m(s) = 


As) 

Ua(s) 


(L a s + R a )(M f + Js) + T 2 


1 compiled by Iris Unger, [11.25] 


(11.107) 
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is required, assuming Ml(s ) = 0. As La Ra, the model can be simplified to 


Gm (s) = 
Km = 


_5__ k m 

J R A s + m 2 + R A M f T m s+ 1 

_jr_. r j^ _ J Ra 

't> 2 +R A M F ’ ±M — v 2 +r a m f 


The residual equations with the output error then become 


r'O) = oj(s) - G m {s ) U A {s) 


and with the equation error 

r(s) = K m Ua{s) - (T M s + 1)«0) 


(11.108) 

(11.109) 

( 11 . 110 ) 


The reaction of these residuals is shown in Figures 11.7 to 11.9. Expected deflections 
result for the additive faults. Usable deflections of the residuals for parametric faults 
are only obtained for A'T as well as for both, no input and input change U A (t ). If the 
output is noisy, the output error should better be used than the equation error. 



Fig. 11.7. Comparison of different residuals for a DC motor with additive faults, without input 
excitation 


Obser\’er-based residuals are generated by a state observer with one measured 
output, a state observer with two measured outputs, an output observer (unknown 
input observer) and a fault sensitive observer. The observer with one measured input 
and output and output residual leads to approximately the same results as the parity 
equations with output error. If two outputs IaU) and a>(t) are measured, (11.61) can 
be used to design a Luenberger state observer. The load torque Mi_(t) can then be 
modelled as disturbance with 


V = 
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Fig. 11.8. Comparison of different residuals for a DC motor with parametric faults, without 
input excitation 



Fig. 11.9. Comparison of different residuals for a DC motor with parametric faults and stepwise 
input excitation U^U) 
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The transfer function of the Luenberger state observer according to Figures 11.1 and 
11.2 becomes 

G r M L (s) = WC[d-A+H C] _1 V = WF Y (11.111) 

compare with (11.15). 

Decoupling from the load disturbance requires 

W F V = 0 (11.112) 


Selecting the observer poles to A 1,2 = —500 1 / s leads to 


0.0073s + 4.5 1 
0.0073s + 4.5 1 


Therefore, two equal residuals result. As a consequence of the decoupling from 
Mt(t) the residuals are independent from the mechanical subsystem of the DC mo¬ 
tor and faults of Mp and J cannot be detected. 

Figures 11.7 to 11.9 indicate expected changes of r(t) for the additive faults 
A U A (t), A IAif) and A co(t). However, significant residual changes are only ob¬ 
tained for the parametric faults A R A and AT in the case of no input excitation, 
and additionally A L A for stepwise input excitation. 

An output obseiyer was designed by satisfying (11.95) to (11.98), decoupling 
from the load disturbance Mp(t) and eigenvalues A. 1,2 = —500 1/s. Then the ma¬ 
trices Cf and T 2 obtain two identical rows, such that two identical residuals result. 
Also (11.103) is satisfied. Figures 11.7 to 11.9 show that almost the same deflections 
result as with the Luenberger state observer and decoupling from Mp(t). 

The design of a fault sensitive obseiyer requires that all states are measurable 
(I A (t) and o>(t)) and that the observer gain according to (11.32) is 


H = A — A I (11.113) 

As for a second order process, only two faults //,■ can be modelled, compare (11.22), 
A U A (t) and AMp(t) are selected. With A, 1,2 = —500 I /s the observer feedback 
matrix becomes 

[275 -50 

H “ L 179 500 

The reaction of the two residuals t'i(t) and i' 2 (t) depicted in Figures 11.7 to 11.9 
shows deflections for all additive faults and for the parametric faults A R A , AT and 
A Mp without and with input excitation AU A (t). Table 11.1 shows the sign of the 
deflections. Here, short impulse like deflections are ignored, because they cannot 
be distinguished from disturbances in practical cases. Among the additive faults, 
A I A and A Mp can be isolated but A U A and A a> cannot be distinguished. For the 
parametric faults only A R A , AT and A Mp show significant changes. However, 
A Mp and A Mp cannot be distinguished and A L A and A J are not detectable. If 
the sign of the fault is unknown following faults cannot be distinguished from each 
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Table 11.1. Residual deflections for the fault-sensitive observer and positive faults 



Additive faults 

Parametric faults j 


A U A 

A I A 

A co 

A M l 

A R A 

A La 

A4< 

A Mp 

A J 

''1 

+ 

+ 

+ 

0 

- 

0 

- 

0 

0 


0 

- 

0 

- 

0 

0 

+ 

- 

+ 


Table 11.2. Residual deflections for the Kalman filter and positive faults 



j Additive faults 

Parametric faults j 


A U A 

A I A 

A (u 

A M l 

A R A 

A L a 

A4< 

A M f 

A J 


+ 

+ 

+ 

0 

0 

0 

- 

0 

0 

r 2 

+ 

0 

+ 

- 

- 

0 

0(-) 

0 

0 


other: A Ua, Aw, A Ra', AIa, AT; A Ml, A Mp. Hence, from 9 considered faults, 
7 faults can be detected. But no unique isolation is possible, only groups of possible 
faults can be indicated. 

A Kalman filter was realized as discrete time model with sampling time To = 
0.001 s. The state estimates are I a (!<) and w(k). For the design of the Kalman gain 
the noise variances were chosen as o 2 = 100 A 2 and cr 2 = 100 1 /.v 2 . The measured 
signals and the state estimates were averaged over a time window of 500 samples and 
the residuals 

r \(k) = I a meas(^') — I a est(^) 
r i{k) — 'T'measfA) — West(^') 
over a time window of 10 samples. 

Figures 11.7 to 11.9 show the results for a process with small noise (a 2 = 
1 1 /s 2 \Oj = 1 A 2 ), to allow a direct comparison with the other residuals. The 
additive faults give deflections of r\ and/or t '2 for all cases. For parametric faults r\ 
and/or r 2 show changes for A Ra and AT, but no reactions for A La, A Mp and 
A/, for both, without or with stepwise input excitation AUa(Jc). A comparison of 
the sign of the residual changes in Table 11.2 indicates that the additive faults A I a 
and A Ml can be distinguished from A Ua and Aw, which both have the same de¬ 
flections. The parametric faults AR a and AT can be detected. But A R a and A Ml 
are not distinguishable. If the sign of the fault is unknown, only groups of faults can 
be indicated: A Ua, Aw; A I a, AT; A Ra, A Ml- Therefore also with the Kalman 
filter no unique isolation of faults is possible. 

Summing up, these simulation results for a DC motor lead to following conclu¬ 
sions (note that the parity equations were designed with a first order model): 

1) Additive faults 

• The considered parity equations and state observers lead to about the same 
residual deflections; 

• The considered state observer and the output observer, both designed for de¬ 
coupling the main disturbance, the load torque, show almost identical resid¬ 
uals; 
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• The Kalman filter shows large similarities to the observer-based methods and 
parity equations without disturbance decoupling. 

2) Parametric faults 

• Out of 5 process parameters only 2 (R a and T) are detectable as changes by 
the observer-based methods, but cannot be distinguished from some additive 
faults; 

• Input excitation has in most cases no effect on the residuals, except in the 
case of Afor the state observer with load decoupling; 

• The parametric faults A La and A J cannot be detected by all methods (ex¬ 
cept A La for the state observer). 

A comparison of parity equations and recursive parameter estimation with measure¬ 
ments for real DC motors is presented in Chapter 20. 


11.5 Problems 

1) State the differences between fault detection with state observers and parity 
equations for processes with one input - one output and two inputs and two out¬ 
puts. Compare the a priori knowledge, design effort, computational effort, noise 
sensitivity, detection of additive and multiplicative faults. 

2) What are the differences between state observers and a Kalman filter with regard 
to fault detection? 

3) What are the advantages of output observers compared to state observers and 
parity equations? 

4) Design a state observer for Problem 5) in Chapter 10 (electrical steering system). 



12 


Fault detection of control loops 


The main goals for using automatic control loops are precise following of reference 
variables (set points), a faster response than in open loop, compensation of all kind of 
external disturbances on the controlled variable, stabilization of unstable processes, 
reduction of the influence of process parameter changes with regard to the static and 
dynamic behavior, partial compensation of actuator and process nonlinearities, and, 
of course, replacement of manual control by humans. The performance of a SISO 
control loop with regard to the control error (deviation) 

e(k) = w(k) — y{k ) (12.1) 

i.e. the deviation of the controlled variable y(k ) from the reference variable w(k) 
depends on many facts, compare Figure 12.1, like: 

• external disturbance u>(k),u v ( k),Vi{k)\ 

• structure and parameters of the controller G c and controller faults f c \ 

• changes of the structure and parameters of the process G p and process faults f p ; 

• changes and faults of actuator G a and f a ; 

• faults f s in the sensor G s and measurement noise n s . 

Hence, many changes and faults influence the performance of closed loops. Usually, 
only the control deviation e and the control variable y are monitored. 


12.1 Effects of faults on the closed loop performance 

Small faults in the actuator and process, be they additive or multiplicative, will usu¬ 
ally be compensated by the feedback controller (with integral action) and they will 
not be detectable by considering e(k) and y(k) only, as long as the control deviation 
turns back to approximately zero. Also small sensor offset faults will not be detected. 
The controller will just make the wrong sensor signal equal to the reference variable. 
Only by a redundant sensor or other redundant information for the controlled vari¬ 
able, the offset fault can usually be detected. 
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Fig. 12.1. Control loop with variables and fault influences 
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Vi 
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Table 12.1 shows the effect of larger faults on the closed-loop behavior. The dif¬ 
ferent faults have a similar effect on the considered changes of closed loop behavior. 
In addition, some of the behavior is also observed after external disturbances un¬ 
der normal operation. Table 12.2. Therefore it is not easily possible to diagnose the 
various faults by observing the listed properties. On the other side, at least for larger 
plants with hundreds of control loops, it would be very practical to have an automatic 
fault detection for the control loops. For example, [12.4] reports that up to 30 % of 
all loops oscillate in pulp and paper processes. In many cases they are then put to 
manual mode or wrongly detuned, [12.5]. As reason frequently stip-slick effects of 
valves is mentioned (which is less the case for valves with position controller). 


12.2 Signal-based methods for closed-loop supervision 

The problem of control performance monitoring is treated in several publications. 
A survey is given, e.g. by [12.8] and [12.3]. First contributions assume a stochastic 
behavior of y(k) and a process with dead time, determine the variance n \ and re¬ 
late it to the output variance ct^y of an optimal minimum variance controller. This 
leads to a performance index I p = a^/a^ v > 1, [12.7]. The only knowledge on 
the process is the dead time of the discrete time process model. Modifications of 
this idea were followed by [12.18], [12.17]. [12.16] considered constraints on the 
control structure. [12.5] proposed methods to detect oscillations and sluggish con¬ 
trol. [12.19] suggested a performance index by relating the closed-loop settling time 
after a deterministic disturbance to the process time-delay. [12.8] modified the per¬ 
formance index by Harris and changed the MV-controller with one pole not in the 
origin and proposed methods for the detection of oscillations and the diagnosis of 
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Table 12.1. Possible effects of large faults on closed loop behavior 


fault 

j observation j 


sluggish 

oscillatory 

large control 

actuator at 


behavior 

behavior 

error 

restriction 

change of process 
structure & parameters 

V 

V 

V 

V 

actuator 

• friction 


V 

V 

V 

• backlash 


V 

V 

V 

• stuck 

V 


V 


sensor 





• offset 




V 

• gain 

V 

V 


V 

• variance 


V 

V 


controller 





• parameters 

V 

V 

V 

V 

• noise 



V 


• wrong tuning 

V 

V 

V 

V 


Table 12.2. Effects of external disturbances on closed loop behavior (well-tuned controller) 


disturbance 

| observation | 

sluggish 

behavior 

oscillatory 

behavior 

large control 
error 

actuator at 

restriction 

large, aperiodic 





• low frequent 



V 

V 

• medium frequent 



V 


• high frequent 



y 


periodic 





• low frequent 

V 




• medium frequent 


V 

V 

V 

• high frequent 


V 

V 

V 


valve stiction, see Section 12.3. However, all these methods do not solve all diag¬ 
nosis problems in closed loop, especially with normal operating data, which are not 
always stochastic, see also [12.3]. 

An advantage of using minimum-variance controllers (MV) as reference is that 
they provide the best possible variance of the control variable for colored stochastic 
disturbances at the output. In this case the controlled variable becomes for a process 
without dead-time 

G,(--> = fg (12.2) 

a white-noise process 

E | y 2 (k)^ = (Ty = X 2 a 2 (12.3) 


where 
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«0) = 7777 a, v(r) (12.4) 

C(r) 

is the noise-filter. For processes with dead time it holds 

E {y 2 (k)) = [l + f 2 + ... + f 2 ] X 2 (12.5) 

see [12.12], where the fi depend on the noise model. However, the MV-controllers 
then require a high manipulation effort and more practical controllers are the gen¬ 
eralized minimum variance controller where a weighting factor r for u(k) is used. 
In addition MV-controllers exhibit a poor control behavior for deterministic distur¬ 
bances and show a remaining offset for lasting disturbances. Therefore, they seem 
not to be good reference controllers for practical purposes. 

A further investigation for signal-based methods was performed by [12.2], Figure 
12.2 shows the time responses to a step change of the reference variable and different 
faults with a size of 10 %. Following signal-based performance criteria were used to 
evaluate the effects for the different faults: 


overshoot: 

JS 

II 

O 

< 

1X - y(oo) 

( 12 . 6 ) 

settling time: 

T s = N s T 0 until | e(k) < £ 

(12.7) 

root of mean-squared control deviation: 




rms e = S e = 

\ 

N— 1 

v E « 2 (« 

k =0 

( 12 . 8 ) 


• root of mean-squared, manipulation effort: 


1 

rms u = S u = — ^2 ~ W(k) - «(oo )] 2 (12.9) 

\ k =0 

• number of zero crossings of control variable for: 

0 < t < T s : K e 

• change of the steady state value of manipulated variable: Am(oo) 

The faults FI and F5 have no effect on the controlled variable y(t) and the manip¬ 
ulated variable u(t). F4 does not influence y ( t ) but leads to a drastic reduction of 
u(k). An increased process gain F2 or an increased sensor gain F 6 force the loop to 
strong oscillations and lower steady state values of u(k). In addition an increase of 
Coulomb friction finally leads to oscillations with much lower frequency and much 
lower settling time. 
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Fig. 12.2. Simulation of closed-loop behavior for a step change of the reference variable. The 
controlled process consists of a 2nd order actuator with small Coulomb friction, a 2nd order 
linear process and a 1st order sensor. The controller is an optimized PID-controller, [12.2]. The 
size of the fault is 10 % of the nominal value. 
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These simulations show that the controller compensates for all faults with in¬ 
creasing time and that in some cases the time behavior of the controlled variable 
becomes more oscillating. However, the manipulated variable shows different time 
behavior for all faults except F5. Therefore the behavior of the manipulated variable 
should be included in the performance evaluation of closed loops. 

The simulation results are summarized in Table 12.3. Applying the 6 criteria, 
gives different patterns for all faults which means, that the faults can be isolated. 
This shows, that different criteria have to be used for fault detection in closed loops. 


Table 12.3. Effect of positive 10 % faults on the simulated closed loop of Figure 12.2 


faults 

overshoot 

Av 0i 

manipulated 

variable 

Auqo 

number of 
zero crossing 

K e 

settling 

time 

T s 

rms e 

Se 

rms u 

S u 

FI 

0 

— 

0 

0 

0 

+ 

F2 

++ 

— 

++ 

++ 

++ 

— 

F3 

++ 
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- 

++ 

++ 

++ 

F4 

0 

— 

0 

0 

0 

0 

F5 

0 

0 

0 

0 

0 

0 

F6 

++ 

- 

+ 

+ 

- 

- 


The described procedure with a step change of the reference variable w(k) can be 
applied for testing of closed loops by an active experiment, if the process operation 
condition allows. For servo control system or actuator position loops a step change 
of w(k ) is quite a natural disturbance and may be used for testing without difficul¬ 
ties. If, however, the closed loop has to be supervised continuously for all kind of 
disturbances, stochastic, single pulses, drift, further developed methods are required. 

One possibility to supervise the control performance is to continuously calculate 
the quadratic control deviation S e and the corresponding quadratic manipulation ef¬ 
fort S u , see (12.8) and (12.9). It was shown in [12.12] that in the S e — S u — plane 
different controllers for a given process give results in certain areas, compare Figure 
12.3. The areas depend as well on the process as on the controllers and disturbances. 


Figure 12.4 shows the change of control performance for the process 


Gp(s) 


u(s) 


K p = 0.2 


K P 


s 2 + 2D co 0 + co 2 
(v 0 = 5- D = 0.9 


(12.10) 


by deviations of the process parameters from the nominal ones for which the con¬ 
troller was optimized, [12.2]. For larger process parameter changes the performance 
criteria leave an area which can be defined as a tolerance zone around the nominal 
point. 
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Fig. 12.3. Mean-squared control deviation S e and mean-squared manipulation effort S u for 
different controllers and processes: (a) and (c) step changes of the reference value w, (b) 
stochastic disturbances n at the process input, [12.11], [12.12], Process 1: low pass process, 
m = 3, d = 1; process 2: non-minimum phase process, m = 2; process 3: low pass process, 
m = 2. DB: deadbeat controller; MV: minimum-variance controller; SC: state controller; LCPA: 
linear controller, pole assignment 

Therefore a tolerance zone AS e , AS U around the nominal performance criteria 
S e „. S un can be defined. In addition, the control performance ratio 

rieu = S e /S u (12.11) 

can be used. Hence, general applicable performance criteria for closed-loop supervi¬ 
sion are 

S e (k) > Sen + AS, AND S u (k) > S un + A S u (12.12) 

Calculating the autocorrelation function (ACF) of the control deviation 

1 N 

R ee (r)= — Y] e(k)e(k- r) (12.13) 

k =1 

indicates the kind of external or internal excitation of the closed loop. By this way 
more or less colored disturbances (noise) or periodic signals can be detected with 
less calculation effort. 
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Fig. 12.4. Effect of process parameter changes on the control performance for a 2nd order 
process with P-controller and step reference value change, [12.2] 


All the mentioned performance criteria (pc) can be calculated as an average over 
a time window length M 


1 

pc(k) = — 

i—k—M 

or by an average with exponential forgetting 

pc(k) = a pc(k — 1) + (1 — a) pc(k) 


(12.14) 


(12.15) 


12.3 Methods for the detection of oscillations in closed loops 

As a partial solution to closed-loop performance supervision, the automatic detection 
of different kinds of oscillations is of importance. Oscillations are usually the result 
of too high controller actions or of nonlinearities like friction, backlash or saturation, 
e.g. in actuators or the process itself, or of signal quantization and external periodic 
disturbances. 

A relatively simple method by calculating the integral of the absolute error (IAE) 
between successive zero crossings 

IAE = [' \e(t)\dt (12.16) 

Jti-i 

was proposed by [12.5]. If 

IAE > 2 a/a>i (12.17) 

then a load-disturbance is likely to be present, where, e.g. a>i = 2tc / Tj. Tj is the 
integral time of a PID-controller and a is an oscillation amplitude, e.g. 1 %. If the 











12.4 Model-based methods 261 


number of the load-detections become high, m > ri/ im . an oscillation is likely to 
happen with, e.g. = 10. 

Another possibility is to calculate the cross-correlation function (ccf) between 
the process input and output, [12.9], [12.8]. 

1 N 

Ruy{r) = — ^2 u(k) y(k - r) (12.18) 

k= 1 

Oscillating external disturbances and an unstable loop lead to a phase shift of about 
n and therefore the ccf becomes an even function. Static friction, on the other side, 
results in a phase shift of about jr/2, and therefore in an odd ccf. 

To distinguish between sinusoidal signals and other periodic signals like rec¬ 
tangular, trapezoidal or triangle oscillations, a Fourier analysis can be performed. 
Except a peak at the basic frequency a>\ further peaks at 3®i, 5o >[,... appear for the 
oscillations stemming from the nonlinearities like friction or backlash, see Figure 
12.5, [ 12.2]. The calculation of the magnitude of the Fourier analysis may be limited 
to some distinct frequencies. 



0 I 111 i i in i ill i i I M I 
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frequency [Hz] 

Fig. 12.5. Amplitude spectrum of different oscillations 


12.4 Model-based methods for closed-loop supervision 

As for a controller with fixed parameters, changes of the closed loop behavior re¬ 
sult from process parameter changes and external disturbances. Another way to su¬ 
pervise control loops, is to observe changes of a process model and a disturbance 
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model. The problem is then identical to that of information gaining for adaptive con¬ 
trol. Especially the parameter-estimation method for model identification adaptive 
systems (MIAS), also called self-tuning controllers, can be applied here, e.g. [12.15] 
and [12.1]. However, then the identification conditions for closed-loop identification 
with discrete time models have to be satisfied if no external perturbations are used. 
These conditions are: 

• model order and dead time exactly known; 

• controller order larger or equal than the difference between model order and 
dead-time: v > m — d 

• enough natural excitation of the process. 

Based on the parameter estimates changes of the process are detected as a possi¬ 
ble source for deterioration of closed-loop performance. Furtheron, the methods for 
the supervision of adaptive control systems can be applied, see [12.14], [12.15] and 
[12.6]. Also for adaptive controllers it is difficult to decide if output deviations come 
from disturbances or process parameter changes. 

Now the application of parity equations in closed loop is considered. As shown in 
Figure 12.6 a residual r is generated by using a fixed process model and calculating 
a polynomial error or an output error, as described in Section 10.1, but now in closed 
loop. 



process 

model 


Fig. 12.6. Fault detection of closed loop with parity equations and output error r' 


The output error then is 

r'(s) = y p (s ) - y m (.v) = y p (s) - G m {s) u(s) 
By introducing (and omitting the Faplace variable s) 


y p = G p u + n 

u = --- (w — n ) 

1 + G C G P 


(12.19) 


one obtains 
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r' = ( G p - G ,„) u + n 

= ( ° p ~ ° ,n) 1 + G c G, 
_ G C (G P 


(w — n ) + n 


1 + G C G 


G m ) 1 + G c G m 
w + - - - „ n 


1 + G C G, 


(12.20) 


Hence, if the model does not exactly agree with the real process, the residual depends 
on the inputs w and n. If, however, they agree, G p = G ,„, it holds 


r'{s ) = n(s ) 


( 12 . 21 ) 


The residual then depends only on the external disturbance, or for additive faults, at 
the input and output, compare Figure 10.1 and (10.4): 


r'(s) = G m (s) f u {s) + f y (s) + n(s ) 


( 12 . 22 ) 


Applying the polynomial or equation error, see (10.5), 

r(s) = A m (s) y p (s ) - B m (s) u(s ) 


leads with 


to 


BJs) Q(s) 

Gp(s) = -y—yy and G c (s) = 


y P 


A p (s) 
B t 


P(s) 


u = 


u + n 


QA, 


P A p + Q B£ 


(w — n ) 


Q (.1 m B p A p B m ) 


w + 


A p (P A m + Q B m ) 


P A p Q Bp P A p -|- Q Bp 

If process and model agree, A p = A m , B p = B m then 

r(^) = A m (s) n(s) 


(12.23) 


(12.24) 


(12.25) 


Also here the residual depends only on external disturbances and for additive process 
input and output faults it follows 

r(s) = B m (s) f u (s) + A m (s) f y (s ) + A m n(s) (12.26) 

Hence, for exact agreement of process and model the output and the polynomial 
residual only depend on the disturbance and process faults in the same way as for the 
process in open loop, as comparison with (10.4) and (10.6) shows. 

This means that the same procedure for fault detection with parity equations 
based on transfer functions can be applied for linear closed loops as for open loops. 
Therefore, for small disturbances n especially additive faults in the actuator, process 
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and sensor can be detected, for example, sensor offsets, increased Coulomb friction 
or backlash in actuators. (The last two lead to direction-dependent residuals, /„ = 
sign u or f u = sign u, see [12.13]). However, if disturbances n are becoming large, 
the threshold for the residuals have to be widened and then only large process faults 
can be detected. However, the residuals do not indicate deviations if the process 
oscillates because of wrong controller tuning or unstable closed-loop behavior if the 
model describes the process accurately. 

This allows to use the parity equations also to reconstruct the disturbance n under 
the assumption that no faults and changes on the process side show up and there is a 
good agreement between process model and process. The disturbance signal is then 


n{s) = r f (s) 


or 


n(s) = 


1 

Am (/S') 


r(s) 


(12.27) 


In this way, the kind of disturbance can be observed, e.g. stochastic, deterministic, 
drift or periodic. 

[12.2] applied on-line parameter estimation for the process model and residual 
generation by parity equations to the temperature control of a steam-heated heat- 
exchanger. 

In summary, parity equations are suitable to be used in closed loop to detect faults 
in the process or extraordinary disturbances which may be the cause for changed 
control performance. Combination with methods for oscillation detection, Section 
12.3, result in a good overall coverage of faults of closed loops. 


Table 12.4. Fault-detection methods for closed loop in normal operation condition (no test 
signals) for some faults (^/ means applicable) 


Faults of 
closed loop 

Signal-based methods 

Process model- 

based methods 


kl 

u u lim 

IAE 

acf 

e 

ccf 

u, y 

Fourier 

anal. 

Se,Su 

param. 

estim. 

parity 

equat. 

sluggish 

behavior 

V 

V 

V 




V 



oscillations 

instability 

V 

V 

V 

V 

V 

V 

V 



oscillations 

external 

disturbance 

V 

V 

V 

V 

V 

V 

V 


V 

friction 

V 

V 

V 


-J 

V 

V 

V 

V 

backlash 

V 

V 

V 


V 

V 

V 

V 

V 

sensor 

offset 


V 





V 


V 

sensor 

variance 

V 


s/ 

V 



V 


V 

controller 

detuned 

V 

V 

V 

V 



V 

V 
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The discussion of several methods for fault detection in closed loops has shown, 
that it is difficult to decide if large deviations in closed loops are caused by large 
disturbances (stochastic, periodic, non-periodic, drift) or by faults in the controller 
(parameters), actuator, process dynamics or sensors. None of the discussed meth¬ 
ods is able to detect all these faults. But by combining several detection methods a 
large portion of closed-loop faults is detectable and isolable. Table 12.4 shows the 
applicability of the discussed fault-detection methods. Depending on the expected 
faults, suitable methods should be combined to enable a diagnosis of the faults. A 
special issue on control performance monitoring is [12.10]. 


12.5 Problems 

1) What are typical changes in the performance of a closed loop caused by not 
appropriately implemented and tuned controllers and faults of components? 

2) What kind of faults can be observed for a closed loop, by monitoring the con¬ 
trolled variable and manipulated variable? Which faults can be differentiated? 

3) List all methods for the detection of oscillations in closed loops. 

4) Design a model-based fault detection with parity equations for a 2nd order linear 
process and a proportional controller according to Figure 12.6. 
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Fault detection with Principal Component Analysis (PCA) 1 


For large-scale processes, such as chemical plants, the development of model-based 
fault-detection methods require a considerable and eventually a too high effort. Then 
data driven analysis methods offer an alternative way. Especially methods based on 
multivariate statistical analysis and here especially Principal Component Analysis 
(PCA) and Projection to Latent Structures (PLS) have received attention, see, e.g. 
[13.9], [13.10], [13.1], [13.3]. 

These methods are attractive where the available process measurements are 
highly correlated but only a small number of events (faults) produce unusual pat¬ 
terns, [13.11], When the process data are highly correlated, the original process data 
can be projected onto a smaller number of principal components (or latent variables), 
thus reducing the dimension of the variables. The PCA models are usually basically 
linear and static and are developed from a process in normal operation. However, 
they can be expanded to other situations. 


13.1 Principal components 

The basic idea of principal component analysis is to reduce the dimensionality of a 
data set considering a large number of interrelated variables, while retaining as much 
as possible of the variation present in the data set. This is achieved by transforming 
the measured data to a new set of variables, the principal components, which are 
uncorrelated. These principal components are ordered so that the first few retain 
most of the variation present in all of the original variables, [13.5], [13.6]. 

It is assumed that x is a vector of a large number of m random variables (mea¬ 
surement signals) and that the variance of the random variables and the structure of 
the covariances or correlations between the m variables are of interest, x can be input 
and output variables of a process. 

Now, a reduced set of a considerable smaller number r < m of variables is 
searched which preserve most of the information given in these variances and co- 

1 according to a presentation by Falko Haus, [13.4] 
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variances. This is obtained by a set of orthogonal vectors in the directions where 
most of the data variation occurs. Then, a few principal components are sufficient to 
capture the data variance, see Figure 13.1. 
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components 
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principal component 
data (r < m variables) 


corrected data with 
principal component 


Fig. 13.1. Generation of a set of reduced number of variables T with only significant variables 
out of original X T = [uj Yp] variables by principal component analysis. Back-transformation 
into original data coordinates X* if required. 


To illustrate the problem, N = 8 measurements of the two variables x\ ( k ) and 
x 2 (k) are considered, see Figure 13.2. The measurements are represented in two 
vectors 

xf = [xi (1), X] (2),..., x, (8)] 
xf = [x 2 (1), x 2 (2),..., x 2 (8)] 

A data matrix 

X = [ Xl x 2 ] (13.2) 

is formed, containing all measured data within the coordinate system (x 2 \xi). As 
Figure 13.2 shows, the measured data fluctuate in both the directions of the coordi¬ 
nates a'i and x 2 . 

Now, a transformation to a new coordinate system (t 2 \t\) is searched, in which 
the variances of the data are maximal in the direction of t\ and second maximal in 
the direction of t 2 , such forming the first and second principal component. A further 
condition is that (t 2 \t\) forms an orthogonal coordinate system. 

Figure 13.3 shows the results. Hence, the data matrix X is transformed into a new 
data matrix 


T = [tr t 2 ] 


(13.3) 
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* 2 * 



Fig. 13.2. Plot of 8 measurements of two variables 


The variance of the data is now very large in the direction of t\ and much smaller in 
the direction of ? 2 . Therefore one principal component t\ approximates the variance 
of the data sufficiently and one variable ? 2 can be neglected. 



Fig. 13.3. Plot of 8 measurements of two variables with principal components 

The general task consists in transforming a data matrix with m variables Xj (k) 

X= [x!,x 2 ...x m ] (13.4) 

with N measurements k = 1,2,..., N into a new data matrix 

T= [t!,t 2 ...t,] (13.5) 

with also N measurements, but smaller dimension r < m. This can be obtained 
through a transformation matrix P 

^[Nxr] = ^|A ; .r»/] P[mxr] (13.6) 

P=[ Pl ,p 2 ...p,] (13.7) 

As this transformation is a rotation matrix or orthonormal, it holds 
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P r P=I (13.8) 

Therefore also 

X = TP r (13.9) 

is valid. In the multivariable statistics terminology T is called the score matrix and P 
the loading matrix. (13.9) can also be written as 


X = ti p[ + t 2 P 2 + • • • + tr pj” = tj pj (13.10) 

7=1 

To find now the elements p ; of the transformation matrix P which leads to maximal 
variances a stepwise optimization has to be solved. For each step j with 


tj = X P/ (13.11) 

a maximal variance of data ty means 


max tj tj = max (X p j) T (X p y ) 

= max p T j X T X p y 

(13.12) 

under the constraint (13.7) 

P ] Py = 1 

(13.13) 

which means that the components are orthonormal. 

A standard approach for this optimization problem is to use the method of La¬ 
grange multipliers, [13.6]. If the function /(p,) has to be maximized under the con¬ 
dition g = pj py — 1 = 0 the loss function becomes 

V = /(Py) - g(Pj) 

(13.14) 

where Xj is the Lagrange multiplier. This leads to 


V = p T j X T Xpj — Xj(Pj Py 1 ) 

and 

dV T 

— = 2X t Xp, -2Xi p.- = 0 
dp j 

(13.15) 

(13.16) 

or 

[x r X - Ay i] Py = 0 

(13.17) 

With 

A = X T X 

(13.18) 

it holds 

[A -Ay I] p y . = 0 

(13.19) 
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Hence, this is a classical eigenvalue problem. A is proportional to the correlation 
matrix or covariance matrix for zero mean variables of the measured data, A .j is an 
eigenvalue and p ,■ an eigenvector of the matrix A. From (13.19) it follows 

Py A Py = Py x ) Py 

and inserting in (13.12) yields for the maximal variance 

max tj t j = max pj Xj p j (13.20) 

Therefore maximal eigenvalues X j give maximal variance for coordinates tj. 

The procedure to determine the transformation matrix P and the new variable T 
is, compare Figure 13.1: 

1) Calculation of the “correlation matrix” 

A = X T X 

with zero mean variables E {xj(k)} = 0 and E = 1, [13.13]. 

2) Calculation of the eigenvalues Xj of the matrix A and the eigenvectors p,- of 

(A -Xj I) p 7 =0 j = ,m 

3) Selection of the largest (most significant) eigenvalues Xj and corresponding 
eigenvectors p,-, j = I..... r. leading to the approximation 

X 7 = ti pf + t2 P^ + . . . + t,- pf 


4) Determination of the transformation matrix P 


P = [pj p 2 ...p,] 


5) Calculation of the new data matrix 


T = XP [ti t 2 ...t,] 

with t j = X' p ; . The result is a new data matrix T with all original data but 
a reduced number r < in of coordinates or variables, i.e. the principal compo¬ 
nents. The principal component data matrix X' carries approximately the same 
information on the variances of the variables as the original data matrix X. 

6) Back-transformation in the original data coordination system yields 

^[Nxm] = T[JVxr] P[ rX m] = ^[Nxm] P [mxr] P [ rX m] (13.21) 

By back-transformation in the original data coordination system one obtains the orig¬ 
inal variables with only significant variances, i.e. insignificant noise effects have been 
removed. 
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The principal component analysis was until now described for steady-state (sta¬ 
tic) behavior with fluctuating variables. By expanding the data by delayed samples 
for discrete-time models 

X = [xi(jfc),X 1( tt- 1) ---X 2 (*r),X2(Jt_ 1 )...] (13.22) 

or by derivatives for continuous-time modelling 

X = [xi,xi . ,.x 2 ,X 2 ■ ••] (13.23) 

PCA can be developed also for dynamic processes. 

Example 13.1: Principal component analysis of a linear first order process 
Figure 13.4 show a dynamic first order process with transfer function 

r ,,_y(s)_ K 

’ u(s) 1 + Ts 

and normal distributed white output noise n(t). The data matrix is 

«('0) v(0) v(0) " 

«(1) m v( 1) 

u{N) y(N) y(N) _ 

For input excitation a sinusoidal function was chosen 

u(t) = Mo sin co i t 

The used parameters for simulations are 

K = 1 T = 5 s 

Mq = 1 Wi = 1 1/S 


X = [uyy] = 



Fig. 13.4. Scheme of a linear process with output noise 


Figure 13.5 depicts the (measured) simulated data and shows that the data fluc¬ 
tuate around a plane with fluctuations in all three directions of the coordinate system 

[j, j, 4 
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Fig. 13.5. Measured (simulated) data of a first order dynamic process with original data 


Table 13.1. Eigenvalues and eigenvectors of data matrix X T X 


Eigenvalues 

Xi = 5847 

j A 2 = 491 

A 3 = 6.37 

Eigenvectors 

Pi = 

0.0531 

0.9985 

0.0105 

P2 = 

0.9940 

-0.0539 

-0.0956 

P3 = 

-0.0960 

-0.0054 

0.9954 


The calculated eigenvalues and eigenvectors determined with the correlation ma¬ 
trix X T X are given in Table 13.1. 

As A 3 A.! and A 3 A 2 , the eigenvalue A 3 can be neglected. The transforma¬ 
tion matrix becomes 


P = [Pi P 2 ] = 


0.0531 0.9940 

0.9985 -0.0539 
0.0105 0.0956 


Back-transformation with (13.21) results in the data shown in Figure 13.6. Compared 
to Figure 13.5, the data are now concentrated in one plane indicating a reduction of 
noise effects. 

□ 


13.2 Fault detection with PCA 

For fault detection with the method of principal component analysis different possi¬ 
bilities can be used. Figure 13.7. Direct application of change detection methods on 
the transformed variable tj lead to the mean, compare Chapter 7, 

Hj(k) = E{tj(k)} (13.24) 


and variances 


a?(A) = E | [tj(k) - iij (A)] 2 } 7 = 1,2,,.., r 


(13.25) 
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0.51 



-10 -1 


Fig. 13.6. Measured (simulated) data of a first order system with two principal components 
only 


However, the interpretation of the results may be a problem and if new variances 
occur in variables which have been neglected by the PCA, they will not be detected, 
[13.7]. Therefore the change detection can also be applied to the back-transformed 
variables X* 

M; (k) = E {x*(k)\ (13.26) 

crf(k) = E |[x i *(A:) —/X;(A:)] 2 J i = 1,2(13.27) 

Here, observed deviations can be directly linked to the process variables Xj(k). 

A further way is to generate residuals between original and back-transformed 
variables. 

r,- (, k ) = Xj (k) — x*(k), i = 1 , 2 ,..., m 
and determine their mean and variances 

IJ-ri(k) = E{ri(k)} 

a\{k) = E {[ ri (k) - ^m 2 ) 

see Figure 13.7c. These measures describe the differences between the present data 
Xj(k) and their principal component analyzed values x* (k). Significant deflections 
of the variables Xj(k) can be detected by exceeding a threshold with methods of 
change detection described in Chapter 7. 

The discussed PCA procedure was described for one-block computation, i.e. the 
data have first to be stored and then to be transformed and residuals to be generated. 
This is directly applicable for time windowed data. The next example shows how 
PCA can be applied for fault detection in real-time. 


(13.28) 


(13.29) 
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Example 13.2: Fault detection with principal component analysis for a simulated 
dynamic first order process 

The fault detection with PCA is now applied for the simulated first order process. 
Figure 13.8 shows the signal flow. 



Fig. 13.8. Signal flow diagram for fault detection with PCA of a simulated first order process. 
(Corresponds to Figure 13.7c) 


The variables 

x T (k) = [u(k) y(k) y{k)\ 

are sampled with sampling time To = 0.02 s. The process is excited with a sinusoidal 
function with a>\ = 1 1/s as in Example 13.1. During a first time period of 10s (a 
learning phase), all data are stored and the transformation matrix P is determined 
according to Example 13.1, leading to two principal components. Then P P is 
known for back-transformation according to (13.21). For fault detection the back- 
transformation is then calculated for each new sampled data 

x* T (k) = x T (k) PP r 

with 

x* T (k) = [u*(k) y*(k) y*(k )] 

A residual vector is determined 

r T (k) = x T (k)-x* T (k) 
and a squared residual quantity 


3 

r\k) = r T (k) r (k) = £ rf{k) 

i= 1 

then shows deviations caused by faults. Figure 13.9 shows simulations for changes 
(faults) of the time constant, the gain and output offset during the time period t\ = 
10 s to ?2 = 20 s. 





Fig. 13.9. Time history of signals without faults for t < 10 s and with faults for t > 10 s: (a) 
change of time constant T = 5 s -* 3 s; (b) change of gain K = 1 -* 2; (c) output offset 
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The calculated back-transformed output variable y*(k ) shows clearly the effect 
of noise reduction by comparison with the signal during the learning phase respec¬ 
tively for t < 10 s. After introducing the changes, the output y(k ) of the process 
deviates accordingly. But the principal component-based output y*(k) does not con¬ 
tain the effect of the changes such that differences r(k ) occur, leading to significant 
deviations of the squared residual quantity r'(k). However, based on this quantity 
the changes cannot be isolated. 

□ 

Examples for fault detection with PCA are published in [13.12], see also [13.8], 
[13.2]. An application example for an automotive wheel suspension is given in Chap¬ 
ter 22. 
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Comparison and combination of fault-detection methods 


The comparison of the different methods for fault detection is not easily performed 
because the final practical results depend on many aspects, like kind of the process, 
kind of disturbances, open or closed loop, nonlinearities, experience of the designer, 
etc. For some classes of processes the structure and at least some parameters are rel¬ 
atively well known, like for electrical, mechanical or hydraulic processes. For other 
classes only rough models are available, as, e.g. for many process industries (cement, 
chemical, mineral, metal and biochemical processing). In addition, many processes 
change their behavior continuously because of different operating points, wear and 
ageing, such that all methods with constant models are problematic. 

However, under specified conditions comparisons are possible, see, e.g. [14.3] 
and [14.1], 


14.1 Assumptions of model-based fault detection 

Model-based fault-detection methods use residuals which indicate changes between 
the process and the model. One general assumption is, that the residuals are changed 
significantly so that a detection is possible with regard to the mostly inherent stochas¬ 
tic character. This means that the offset of the residuals after the appearance of a fault 
is large enough and lasts long enough to be detectable. This may be called a “signif¬ 
icant change”. The various fault-detection methods are characterized by their under¬ 
lying assumptions summarized in Table 14.1. All considered model-based methods 
require of course that the process can be described by a mathematical model. As 
there is almost never an exact agreement between the process and its model, the kind 
and size of model discrepancies are of primary interest. In the following an attempt 
is made to summarize some special features of the different methods: 


a) Parity equations 

• model structure and parameters must be known and must fit the process well; 

• especially suitable for additive faults; 
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Table 14.1. Qualitative comparison of properties of different fault-detection methods for linear 
processes. SISO: single-input single-output. MIMO: multi-input multi-output 


criteria 

parity 

equations 

state 

estimation 

parameter 

estimation 



state 

observer 

output 

observer 


assumptions 


model structure 

exactly known 

exactly known 

known 

model parameters 

known, constant 

known, constant 

unknown, 

time-varying 

disturbance 

models for 
unknown inputs 

exactly known 

exactly known 

exactly known 

noise 

small 

small 

medium 

stability of 
detection scheme 

no problem 

depends 
on design 

no problem 

no problem 

excitation by 
the input 

additive faults: no 
multiplicative 
faults: yes 

additive faults: no 
multiplicative 
faults: yes 

additive faults: no 
multiplicative 
faults: yes 

detectable faults 


abrupt 

yes 

yes 

yes 

drift 

yes 

yes 

yes 

incipient 

yes 

yes 

yes 

single faults 

yes 

yes 

yes 

multiple faults 

SISO: no 
MIMO: yes 

SISO: no 
MIMO: yes 

SISO: yes 
MIMO: yes 

fault isolation 

MIMO: yes 

MIMO: yes 

SISO: yes 
MIMO: yes 

additive 

yes 

yes 

yes 

multiplicative 

no 

no 

yes 

general 


robustness 

parameter 

changes 

problematic 

problematic 

unproblematic 

nonlinear 

processes 

many classes 
possible 

limited 

many classes 
possible 

static processes 

yes 

no 

straightforward 

computational 

effort 

small / medium 

medium 

medium / larger 

closed loop 

yes 

yes 

yes, exter¬ 
nal excitation 
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• in general multi-outputs required; 

• very fast reaction after sudden faults; 

• online real-time application possible for fast processes; 

• computationally more intense than parity equations; 

• no input signal changes required for additive faults (but then, some parameter 
changes are not detectable); 

• some faults to be detected can be small, (e.g. additive faults and gain), some must 
be large, (e.g. time constants). 

b) State observers, state estimation 

• the model structure including parameters must be known rather accurately; 

• especially suitable for additive faults; 

• in general multi-outputs required; 

• very fast reaction after sudden faults; 

• on-line real-time application possible for fast processes, if not too many ob¬ 
servers required; 

• no input signal changes required for additive faults (but then some parameter 
changes, e.g. time constants, not detectable); 

• some faults to be detected can be small, (e.g. additive faults and gain), some must 
be large; 

• observers have very similar properties as parity equations. 

c) Parameter estimation 

• model structure to be known; 

• especially suitable for multiplicative faults; 

• also additive faults on the input and output signal can be detected; 

• several parameter changes are uniquely detectable for one input and one output 
measurement; 

• very small changes are detectable, which includes the detection of slowly devel¬ 
oping as well as fast developing faults; 

• on-line real-time application possible, even for fast processes; 

• input excitation required for dynamic process parameters. 

Parity equations and observer-based methods have partially almost identical 
properties, but parity equations are much simpler to design, to implement and to 
understand. They can also be easily expanded to nonlinear processes. Parity equa¬ 
tions and observers are well suited for additive faults, but are not in general well 
suited for multiplicative faults. For multiplicative, i.e. parametric faults, parameter 
estimation is best suited. Also some additive faults can be modelled as unknown pa¬ 
rameters. However, parameter estimation with dynamic models needs, in general, a 
dynamic input excitation. For static processes only measurements for different oper¬ 
ation points are required. 
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A further essential difference is that parity equations and observer-based meth¬ 
ods need more than one output measurement to detect and isolate several faults, but 
that for parameter estimation one input and one output are sufficient to detect and 
diagnose different faults. 

In the case of abrupt faults state estimation and parity equations react faster than 
parameter estimation for the basic methods described above. This is due to the fact 
that parameter estimation is intended to estimate constant values and to remove the 
influence of disturbances with time. If, however, parameter estimation is designed 
for time varying parameters by a forgetting factor or by including a dynamic state 
model for the parameters (resulting in a Kalman-filter type estimator) it is able to 
follow rapidly abrupt parameter changes on cost of disturbance rejection. Also state 
estimation can be designed for better disturbance elimination on cost of rapid state 
changes following. Hence, the property to follow abrupt changes rapidly depends for 
both parameter and state estimation on the design. 

Of course, a large influence on all methods have the assumptions on fault mod¬ 
elling. As considered in Chapter 5, the modelling of faults needs the understanding 
of the many kinds of real physical faults and their mapping to the used mathematical 
models and fault-detection methods, compare the remarks on fault modelling at the 
end of Section 5.2.3, where it was stated that many faults are of multiplicative nature 
and that additive faults are applicable to some sensor faults and actuator faults, see 
also the examples in Part V. 


14.2 Suitability of model-based fault-detection methods 

For single-input single-output processes (SISO) the results can be summarized as 
follows. As parameter estimation is especially suitable for multiplicative faults, this 
detection method can be primarily recommended for corresponding faults in the 
processes and faults which change the dynamics of actuators and sensors. But also 
additive faults at the input and output can be included in the parameter estimation, as 
for static actuator and sensor faults. State estimation and parity equations have their 
advantages for additive faults and are therefore feasible for corresponding faults in 
the sensors, actuators and in some cases for processes. 

State observers for SISO processes can be applied if the faults map into the ob¬ 
served state variables, like for leak detection of pipelines. However, the applicability 
of output observers and parity equations is rather limited for SISO systems as only 
one residual can be generated which does not allow to isolate different faults. 

For single-input multi-output processes (SIMO) and multi-input multi-output 
processes (MIMO), compare Figure 5.2, the analytical redundancy between the mea¬ 
sured inputs and outputs increases. This is advantageous for parity equations and out¬ 
put observers, because it allows to generate different residuals which can be made 
independent on certain inputs and faults, thus enabling fault isolation. However, on 
the other side, it is more difficult to obtain precise MIMO process models with all 
cross-couplings. 
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14.3 Combination of different fault-detection methods 

The preceding discussions show that parameter estimation on the one side and state 
estimation and parity equations on the other side show advantages and disadvan¬ 
tages with regard to the detection of the various types of faults. Therefore, if all 
faults should be detectable, different detection methods should be combined prop¬ 
erly in order to mainly make use of their advantages. As in most cases the model 
parameters are unknown anyhow, it is quite natural to apply parameter estimation 
first. Then following combinations of model-based detection methods result, [14.2], 
[14.3], 

1. Sequential parameter estimation and parity equations 

• parameter estimation to obtain the model; 

• parity equations for change detection with less computations; 

• parameter estimation (on request) for deep fault diagnosis. 

2. Sequential parameter and state estimation 

• parameter estimation to obtain the model; 

• state estimation for fast change detection; 

• parameter estimation (on request) for deep fault diagnosis. 

3. Parallel parameter and state estimation 

• for multiplicative and additive faults; 

• depending on input excitation. 

The way of combination depends very much on the process, the faults to be 
detected and the allowable computational effort. 

In some cases also the integration of process model-based and signal model- 
based detection methods give a good overall information. 

4. Parameter estimation and vibration analysis 

• parameter estimation for parameter mapping faults; 

• vibration analysis for other type of faults like unbalance, knocking, chattering. 
(This is especially attractive for rotating machines). 

By this way of combining suitable detection methods the most relevant analytical 
symptoms can be generated and used for integrated fault diagnosis. 

Figure 14.1 shows a scheme for the combination 2. Examples are shown in Part V. 
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^ faults^ 



Fig. 14.1. Combination of parameter estimation and parity equations: parity equations are used 
to detect changes in the process on-line and real-time. After the residuals exceed thresholds, 
parameter estimation is applied to gain more information on the process and to allow a deeper 
fault diagnosis (online, close to real-time)(SVF: state variable filter, if required) 
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Fault-Diagnosis Methods 



15 


Diagnosis procedures and problems 


15.1 Introduction to fault diagnosis 

The fault diagnosis task consists of the determination of the fault type with as many 
details as possible such as the fault size, location and time of detection. The diagnos¬ 
tics procedure is based on the observed analytical and heuristic symptoms and the 
heuristic knowledge of the process, as shown in Figure 2.7. 

Figure 15.1 summarizes the single steps as well for automatically measured vari¬ 
ables as for human observation. In both cases a feature extraction and a detection of 
changes the normal or nominal situation takes place. Analytical and heuristic symp¬ 
toms must then be brought in an unified symptom representation in order to perform 
the diagnosis. 


measured calculated analytical unified 

variables features symptoms symptoms faults 



-«-fault detection -«- -«-fault diagnosis-*- 

Fig. 15.1. General scheme for fault detection and fault diagnosis with analytical and heuristic 
knowledge 


Note that features were described in Section 2.5 as extracted values from signal or 
process models, describing the status of the process, (e.g. parameters, state variables, 
parity equation errors or residuals) and that symptoms are unusual changes of the 
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features from its normal or nominal values. In a fault-free case the symptoms are 
zero. 

The inputs to the knowledge-based diagnosis procedure are all available symp¬ 
toms as facts and the fault-relevant knowledge about the process. In more detail these 
are, compare Section 2.1: 

a) Analytical symptoms 

The analytical symptoms s a , are the results of the limit checking of measurable sig¬ 
nals, signal or process-model fault-detection methods and of change-detection meth¬ 
ods, as described in Section 2.1 and Chapter 7. 

b) Heuristic symptoms 

Heuristic symptoms s/,i are the observations of the operating personnel in the form 
of acoustic noise, oscillations or optical impressions like colors or smoke, obtained 
by inspection. These empirical facts can usually only be represented in form of qual¬ 
itative measures, e.g. as linguistic expressions like “little”, “medium” or “much”. 

c) Process history and fault statistics 

A third category of facts depends on the general status, based on the history (past 
life) of the process. This process history includes the past information of running 
time, load measures, last maintenance or repair. If fault statistics exist, (e.g. from 
“statistical process control”) they describe the frequency of certain faults for the 
same or similar processes. Depending on the quality of these measures, they can be 
used as analytical or heuristic symptoms. However, the information on the process 
history in general is vague, and their facts have to be taken as heuristic symptoms. 

The knowledge about the symptoms can be represented, e.g, in the form of data 
strings and can include, for example, number, name, numerical value, reference 
value, calculated confidence or membership value, time of detection, explanatory 
text, [15.1], 

d) Unified symptom representation 

For the processing of all symptoms in the inference mechanism, it is advantageous 
to use a unified representation. One possibility is to present the analytic and heuristic 
Sj symptoms with confidence numbers 0 < c(sj) < 1 and treatment in the sense of 
probabilistic approaches known from reliability theory, [15.1]. Another possibility is 
the representation as membership functions 0 < /x(,v/) < 1 of fuzzy sets, [15.4], 

By these kinds of fuzzy sets and corresponding membership functions, all the 
analytic and heuristic symptoms can be represented in a unified way within the 
range 0 < /i (s; ) < 1. These integrated symptom representations are then the in¬ 
puts for the diagnosis procedure. Figure 2.4 and Figure 15.1. Diagnosis knowledge- 
representation including a priori knowledge and various symptom representations is 
treated in Section 15.2.1. 
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e) Fault-symptom relationships 

The propagation of faults to observable symptoms follows in general physical cause- 
effect relationships. Figure 15.2a shows that a fault in general influences events as in¬ 
termediate steps, which then influence the measurable or observable symptoms, both 
by internal physical properties. The underlying physical laws, however, are mostly 
not known in analytical form, or too complicated for calculations. The fault diagno¬ 
sis proceeds the reverse way. It has to conclude from the observed symptoms to the 
faults. Figure 15.2b. This implies the inversion of the causality. One cannot expect to 
reconstruct the fault-symptom chains solely from measured data, because the causal¬ 
ity is not reversible or the reversibility is ambiguous, [15.2]. The intermediate events 
between faults and symptoms are not always visible from the symptoms behavior. 
Therefore, mostly structured knowledge has to be included, known from inspection 
of the process faulty behavior. 


physical system 


diagnosis system 



diagnosis 

A 


effect 


observation 


(a) (b) 

Fig. 15.2. Fault-symptom relationship: (a) physical system: from faults to symptoms; (b) diag¬ 
nosis system: from symptoms to faults 


If no information is available on the fault-symptom causalities, experimentally 
trained classification methods can be applied for fault diagnosis, see Chapter 16. 
This leads to an unstructured knowledge base. If the fault symptom causalities can be 
expressed in the form of if-then rules reasoning or inference methods are applicable. 
This is considered in Chapter 17. Figure 15.3 gives a survey on the diagnosis methods 
treated in the next two chapters. 


15.2 Problems of fault diagnosis 1 

The main challenges of fault diagnosis are given by the knowledge representation, 
the introduction of prior knowledge, the typical symptom distributions and the data 
size and representation. These problems will be briefly introduced in the following. 

1 follows Dominik Fiissel, [15.2] 
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Fig. 15.3. Survey of fault-diagnosis methods 


15.2.1 Diagnosis knowledge representation 

Considering the diagnosis methods, different methods of knowledge representation 
can be present: 

1) Analytic diagnostic knowledge comes from physical laws or quantitative mea¬ 
surements and observations. Typical are physical-based fault-symptom relation¬ 
ships as in Figure 15.2a which are required for rule-based inference methods or 
measurements that form the basis of a classification scheme where the knowl¬ 
edge is aggregated indirectly in certain parameters. 

2) Heuristic diagnostic is the knowledge that is not explicitly written or described. 
It is defined as the result of learning by experimental and especially trial-and- 
error methods. It comes from the experience of operators and system engineers. 
Possible forms of knowledge representation are (following [ 15.1]): 

• Rules 

• Frames (object-oriented representations) 

• Predicate logics 

• Directed graphs (especially networks and tree structures) 

The heuristic knowledge is frequently expressed by inference mechanisms like 
th e forward and backward reasoning. 

15.2.2 Prior knowledge 

In many applications, prior knowledge is present. It can come from experience or 
physical understanding of the processes. In some cases, it can even be quantita¬ 
tive knowledge of certain relationships. This can especially be advantageous within 
model-based fault detection and diagnosis. The reason can be found to be that phys¬ 
ical parameters p p / lvs have a functional relationship to the model parameters, see 
Chapters 5.2, 9.4 and 23.4: 


@ model — /(P phys) 


(15.1) 
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The influence of faults to the physical parameters is usually known. With the knowl¬ 
edge of (15.1), one can utilize this information. Sometimes, only characteristics of 
the relationship (such as monotony) is known. An example for such a relation is the 
rotor resistance of a DC motor that depends on the physical variable “temperature”. 
While the temperature is not a parameter of the motor model, one can nevertheless 
estimate its influence on the resistance of the motor wiring. Physical knowledge tells 
then that an overheating of the motor is indicated by a high resistance parameter. 

Other prior knowledge might be more general such as information about similar 
or independent faults that can be used to structure the diagnosis system. In any case, 
all usable prior knowledge should be utilized. 

15.2.3 Typical statistical symptom distributions 

If a diagnosis system is to be built from experimental data, one has to consider the 
typical statistical data distributions that can occur. Such diagnosis systems rely on 
two types of symptoms: 

1) Estimated model parameter changes 

2) Deviations of model outputs and measured signals (residuals) 

Faults that influence a physical parameter will be reflected by changes of the model 
parameters. Chapter 23.4 shows that the relationships (15.1) are frequently multi¬ 
linear functions of the kind 

^ model = E^n P physij ( 15 . 2 ) 

j i 

or can be approximated in certain regions as such, cj denotes constant parameters 
and 6 mo del • Pphvs the model and physical parameters respectively. Change of phys¬ 
ical parameters lead to multiplicative deviations of the model output signals from 
the measured signals. The conclusion from this knowledge is the typical symptom 
distribution arising from multiplicative faults. The distributions can be favorable or 
unfavorable for some classification approaches, [15.2], 

15.2.4 Data size 

Another typical problem of the experimental fault diagnosis in form of classification 
methods is the size of available data sets. To have a statistically sufficient data base, 
one would have to evaluate a large number of data. That, however, is not only tedious 
to do: sometimes, faulty systems that could be used for measurements are simply not 
available or an artificial fault cannot be introduced for other reasons. A faulty system 
might be too dangerous to operate or too expensive to get. Numerical simulations 
will always only be a more or less adequate substitute because one can hardly sim¬ 
ulate all effects of real faults. This problem is especially severe as the data is often 
highly-dimensional. This leads to extremely sparse data sets that are difficult to han¬ 
dle (“curse of dimensionality”). The diagnosis for a DC motor, for instance, is built 
from a 21-dimensional symptom space, see [15.2], Algorithms that automatically 
build diagnosis systems must be able to handle that problem of sparse data. 
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15.2.5 Symptom representation 

To derive a diagnosis system that is able to cope with different problems, one must 
reach a common symptom representation. Generally, two different sorts of infor¬ 
mation can be involved, as discussed in Section 15.1: analytical information and 
operator-observed information. This information can constitute as four different data 
types (following [15.5]): 

1) Binary variables; 

2) Multi valued variables; 

3) Interval scaled variables; 

4) Metric variables. 

A multi-valued variable is given if its different states can be coded by charac¬ 
ters. One has to distinguish if the characters constitute an order (like 1,2,3 or small, 
medium, large) or just different states must be named. The first kind can be translated 
into metric variables. The latter allows to replace multi valued variables by a set of 
binary variables. Interval-scaled variables contain a statement that refers to a certain 
interval (like 50%), but generally resemble metric variables. Since in practice metric 
variables can also not assume any value, there is practically no difference between 
these two variable types. 

Binary variables, however, are different. They are often coded with a 0/1 scheme 
and then treated like metric variables. This is appropriate as long as not a too large 
number of binary variables hinders numerical methods. Some algorithms (such as 
clustering methods) can be influenced by such artificial variables and become unsta¬ 
ble. Multi valued variables without ordering should always first be coded binary and 
then metrically. The difficulty of such conversions of data types is that the methods 
dealing with data are usually designed for one representation only and will deterio¬ 
rate if such artificial numbers are involved. 

The solution to this problem is the representation of all variables as fuzzy sets 
as mentioned in the last section. Figure 15.4 shows some examples for cases where 
the symptom Si, either increases or decreases. Figure 15.4a, b has the advantage that 
only one membership function has to be processed, in contrast to Figure 15.4b where 
five membership functions for linguistically expressed changes have to be processed. 
This allows to process both, input from human operators as well as numerical data 
of different kinds, [15.3]. The disadvantage of the fuzzy representation is that the 
translation into fuzzy sets can yield an information loss through the projection into 
so-called fuzzy membership values. Care must therefore be taken to choose appro¬ 
priate functions for the fuzzy sets. This issue will be addressed in Chapter 17. 


15.3 Problems 

1) Take a DC motor as an example and classify following symptoms into “analyt¬ 
ical” or “heuristic”: increase of armature resistance, vibrations, yellow or blue 
brush fire, smoke, reduced torque, reduced speed, high temperature. 
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decreased increased 



-As +As 

Fig. 15.4. Example for the unified symptom representation as membership functions in form of 
fuzzy sets 


2) Use the symptoms of 1) and develop a fault-symptom tree for following faults: 
too large current, brushes without solid contact, plugged cooling channel, bear¬ 
ing fault. 

3) Which analytical and heuristic diagnostic knowledge is available for an propor¬ 
tional acting electromagnetic solenoid actuator? 

4) State observable symptoms for faults of a thermocouple as temperature sensor. 
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Fault diagnosis with classification methods 1 


The task of the diagnosis system is to separate n j different faults 

Fj , j e {!,...,«/} 

(16.1) 

using n s symptoms 

si , i e { 1 ,. 

■ ■,n s } 

(16.2) 

The faults are combined into a fault vector 

F = [T’i F 2 . 

■ ■ F nf ] 

(16.3) 

and the symptoms into a symptom vector 

S = [SU2 • • 

• S ns ] T 

(16.4) 


Nearly all methods compute a fault measure f for each fault class Fj. The decision 
about the most probable fault is then given by the fault with the maximum value fj. 
In reality, however, not only the largest fj is of relevance: unclear situations and 
measurement noise can create high values of multiple fault measures fj. This can 
indicate uncertain decisions. Hence, all values of fj are finally important for the 
diagnosis statement of the system. 

In this and the next chapter two main classes of fault diagnosis are considered, 
the classification methods without structural knowledge and the inference methods 
with structural knowledge. 


16.1 Simple pattern classification methods 

This section will briefly review simple classification methods that are nevertheless 
common for fault diagnosis applications. The understanding of these methods lays 
ground for the more advanced methods of the following sections. If no structural 

1 compiled by Dominik Fiissel, [16.7] 
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knowledge is available for the relation between the symptoms and the faults, classi¬ 
fication or pattern recognition methods can be applied. Figure 16.1 shows that ref¬ 
erence symptom vectors s re f are determined for the faults Fj experimentally by 
learning or training. Comparison with the observed symptom s then determines the 
faults by classification. 



s 2 

Fig. 16.1. Fault diagnosis with classification methods 


Figure 16.2 summarizes the problem statement in more detail. The classification 
methods are used to represent the diagnosis functional mapping from the symptom 
space to the space s of fault measures fj. The nomenclature tj for the desired binary 
value of the fj has been chosen following the equivalent term target value that comes 
from neural network learning terminology. There, the desired output of the network 
is the binary target value that the network needs to aim at producing. In Section 
16.6 neural networks are discussed as diagnosis methods. In that context, the fault 
indicators become the target values of the network training. 

The classifier needs to map the relationship between the computed symptoms 
(calculated from measurements) and the fault indicators fj with binary desired val¬ 
ues tj . Since the classification methods usually compute a value between 0 and 1, an 
additional maximum operation is later needed to determine the fault diagnosis result. 

The most commonly used classification methods are summarized in Figure 16.3. 
Historically, the statistical methods came first, followed by the density-based meth¬ 
ods and general approximation approaches. The artificial intelligence methods were 
historically the latest to be developed for diagnosis problems. 


16.2 Bayes Classification 

The most well-known classification scheme is given by the so-called Bayes classi¬ 
fication. The approach is based on reasonable assumptions about the statistical dis- 
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Fig. 16.2. General problem of fault diagnosis with pattern classification methods 



Fig. 16.3. Pattern classification methods 


tribution of the symptoms. A common procedure is to assume Gaussian probability 
density functions, [16.8]: 

PiS)= (2^W2|I]|^ eXP (-^ S - So)rE ~ 1(S - So) ) (16 ' 5) 

This function is determined by its constants, the covariance matrix X and the cen¬ 
ters So - The common procedure is to determine maximum likelihood estimations for 
these parameters. The centers, for instance, are given by the mean values of the ref¬ 
erence data. Other approaches use recursive parameter estimation methods or Bayes 
inference. Building a classification system from the probability density estimations 
requires the class specific densities. It can be shown that a minimum of wrong deci¬ 
sions is achieved if the maximum of the p(Fj |s) is selected. This posterior probabil¬ 
ity can be calculated with the help of the Bayes-Law: 


fj(s) = p(Fj |s) = 


p(s\Fj)P(Fj) 
P( s) 


(16.6) 
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The class specific densities p(s\Fj) can be estimated from labelled reference data 
using those data points belonging to fault Fj, Figure 16.4. One of the methods men¬ 
tioned above can be employed to gain p(s\Fj). Since only the maximum of the fj is 
of interest, the denominator in (16.6) is not significant, because it does not depend on 
Fj . It only plays the role of a normalizing factor and is irrelevant for the comparison. 
The prior probabilities P(Fj), however, are important. Provided enough reference 
data is available, they can be estimated from their frequency of occurrence in the 
data set. In many applications, unfortunately, these priors cannot be determined. If 
the reference data is created from experiments where the occurrence of the faults can 
be influenced, one should assume 


P{Fj) = — (16.7) 

n f 

unless experience suggests a better choice. This assumes that all faults occur with the 
same probability. The importance of the priors for the overall quality of the diagnosis 
system should not be underestimated. Carefully selected, they can improve the per¬ 
formance of a diagnosis system substantially. The common assumption of Gaussian 
distribution can in reality be problematic. This holds, e.g. in the case of distributions 
with overlapping fault areas, [16.7]. 



Fig. 16.4. Example for two class specific densities p(s\Fj) (the statistical distribution of one 
symptom s belongs to two different faults F\ and F 2 ) 


A more realizable case is given if a histogram is used to generate a nonparamet- 
ric estimation for p(s\Fj). A necessary condition is then again the availability of 
sufficient amounts of data. For situations with a lack of enough data points, one can 
try using histogram methods with variable-size grids. This, again, is very similar to 
geometric classifiers that are subject of the following section. 
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16.3 Geometric classifiers 


Geometric classifiers determine the class membership of a data point from its dis¬ 
tance to reference data points. These reference data points are characterized by their 
symptom values s re fj and the known class assignment (class F\ or class F 2 , etc.). 

The simplest and most famous approach is the nearest neighbor classification 
that evaluates the Euclidean distance. If one wants to determine the class of the data 
point s j, for instance, one has to compare the distances from this data point to all 
reference data points and determine the minimum: 


min (di) = 



i £ {1...., tlr e f } 


(16.8) 


where n re f denotes the number of reference data points. The class of the one s re f,min 
being closest to Sj is then taken as the class of sy. Figure 16.5 pictures the concept. 
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Fig. 16.5. Example of nearest-neighbor classification. The minimum distance to the point sy is 
found to be to a reference data point belonging to F 2 . Therefore, the result of the classification 
of Sj is F 2 


The drawbacks of this method are obvious if for instance in the case of overlap¬ 
ping class regions only a few reference points exist. The resulting decision boundary 
is not smooth, hence, probably sub-optimal. A regularization can then be achieved if 
not the distance to just one reference point is evaluated, but rather k reference points 
s re f are used. The k nearest neighbor approach utilizes a voting of the k closest data 
points. It is apparent that this method comes close to a local parameter-free probabil¬ 
ity density estimation. The class that most s re f belong to is identical to the one which 
has the highest relative frequency of occurrence in the considered hypersphere cen¬ 
tered around s. The size of the hypersphere is governed by the reference data density 
at s. 

Commonly cited as a disadvantage of nearest neighbor classification is the need 
to store all reference data points. This problem is relieved in the view of modern 
computing and storage capabilities. Furthermore, there are techniques to reduce the 
number of necessary points to be stored by an intelligent selection strategy. One 
example is the so-called condensed nearest neighbor approach, [16.9]. 
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16.4 Polynomial classification 

The polynomial classification uses a special functional approximation for the pos¬ 
terior probabilities of the classes instead of the Gaussian functions assumed for the 
Bayes classification scheme. Employed are polynomials 

p(s\Fj) = fj = aj, o + aj'isi + aj, 2 s 2 + ... + aj,„+isis 2 + ■■■ (16.9) 

defined by their parameters a j = [cijfi r/y,i ... cij, np ] T with the maximal number 
of polynomial terms given by 


/ n s + o\ 

n p ,max = [ q j (16.10) 

with o being the highest polynomial order. The coefficients are determined using a 
least squares approach with a loss function Vj : 


N 

Vj = - fj(s(k))) 2 ->• min. (16.11) 

k= 1 


where k runs over all N reference data points. The tj (k) follow the usual 0/1 nota¬ 
tion, i.e. 


tj(s(k)) = 


{ 1 if .s(A') belongs to fault Fj 
0 otherwise 


(16.12) 


This optimization is logical, because the fault class of the reference data points is 
known. Hence, the probabilities, here equal to the target values, can only be 0 or 1. 
The polynomial classifier is used in the following way: 


• evaluate all polynomials (one per fault class); 

• find the maximum out of the computed fj : the data point belongs to the corre¬ 
sponding polynomial. 

Figure 16.6 shows a typical decision boundary for a two-class problem. The de¬ 
cision boundary is given be the line of equal polynomial values )i(s|Fi) = /i(s) 
and p(s\F 2 ) = /y(s). The decision boundary line itself is in general composed of 
polynomials of the same order like the fj. 

The parameters a j of the polynomials minimizing (16.11) can directly be solved 
for because the error measure is linear w.r.t. the parameters: 


with 



(16.13) 


tj = [tj(s(l)) 0(s(2)) ... tj(s{N))] T (16.14) 


and 
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Fig. 16.6. Example of the decision boundary of a polynomial classifier 


/ 1 si(l) s 2 (l) ... Ji(l> 2 (l) ...\ 

1 s\{2) s 2 (2) ... 


\1 si(N)s 2 (N) ... / 


(16.15) 


The simplest case is given with o = 1. The resulting classification system is popu¬ 
lar and known under different names. One is the linear discriminant technique fre¬ 
quently used in statistics, especially in medical diagnosis applications. Such a sys¬ 
tem can solve linearly separable problems. These are all those problems for which 
the classes can be separated with a line (in 2-D), or a plane or hyper-plane for multi¬ 
dimensional problems respectively. This plane is retrieved from the linear polyno¬ 
mial classifier for two classes by setting (16.9) equal to 0.5. For the case of Gaussian 
distributions with diagonal covariance matrices, different approaches will lead to the 
identical result, regardless whether they are based on pseudo-inverse solutions like 
(16.13), nonlinear optimization techniques like the gradient descent method or per- 
ceptron learning. The achieved solution coincides with the optimal decision bound¬ 
ary given by the Bayes-law. 

For the general polynomial classifier, the biggest problem is to choose the appro¬ 
priate order o and select the correct polynomial terms. The polynomial should usually 
not be complete. Especially with o > 1 and larger n s , a complete polynomial would 
in most applications be unnecessarily large. The following has to be considered in 
choosing the polynomial: 


• a large number of terms can easily create overfitting with bad generalization be¬ 
havior; 

• a small number of terms might lead to systems not flexible enough for the dis¬ 
tinction of the classes; 

• the wrong selection of the a j can lead to numerical problems. Linearly dependent 
column vectors of 'F make a solution with (16.13) infeasible. 


Typical polynomial orders are 1 (which equals the linear discriminant techniques) 
and o = 2 or o = 3. Higher orders are not necessary in most applications. The 
selection of only a subset of all possible polynomial terms can be based on two 
principles: 
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1) Selection according to the ability to reduce the error measure Vj from (16.11); 

2) Selection according to their linear independence. Most independent terms are 
selected first. 

A removal of completely or nearly linear dependent terms should be done in any 
case. The selection strategies can be implemented while solving (16.11) for instance 
with a Gauss-Jordan algorithm, [ 16.16], a transformation like the Orthogonal Least 
Squares, [16.14], or a singular value decomposition, [16.15]. The selection process 
does not only improve the numerical stability of (16.13), but also indicates inappro¬ 
priately chosen or meaningless symptoms s. This information can be of great help in 
designing the symptoms or reducing the complexity of the system. 


16.5 Decision trees 

Originating in the social sciences are different types of decision trees that are used to 
classify data. Similar approaches are also common in the classification of botanical 
species. The system basically relies on a series of questions that have to be answered 
and depending on the answer the next question narrows the species more and more 
until the exact plant is determined. Typical for biological problems are binary fea¬ 
tures that can be answered without ambiguity. The collection of all questions forms 
the complete decision tree. One can picture a whole set So of data tuples being subdi¬ 
vided into two sets 6) i and S 1 2 by a decision V\. The two sets are then again broken 
down into more sets forming a tree. Ideally, the splitting is finished if the sets contain 
solely a single class of data. Then, a further division is not necessary. The class in¬ 
formation of the remaining set is assigned to a leaf of the tree. The use of the tree is 
straightforward: From the top a new data point is confronted with the decisions until 
it reaches a leaf and is classified according to the leaf’s class membership. 

Figure 16.7a shows such a tree for the distinction of the two faults F\ and /- 2 
using two continuously distributed symptoms .Vi and S 2 . The decisions are binary but 
based on a continuous variable. The example also shows that the tree is not complete, 
i.e. it does not have an identical size in all branches. The resulting symptom space 
segmentation can be seen in Figure 16.7b. 

Flow is such a tree structure built if not from prior knowledge of an expert? For 
the case of fault diagnosis, this involves two questions: Which symptom is to be 
chosen and, following the example, which value is an appropriate threshold to yield 
a sensible division into subsets? The procedure is based on a single-step optimal 
strategy: One tries to implement the decision that results in subsets with maximum 
“purity”, meaning that the sets should contain data of most similar type, ideally of 
only one single class membership. Instead of the purity one calculates a measure of 
impurity, the entropy of the set based on the statistical definition of an entropy index: 

ientropy (?) = P J lo & P i (16.16) 

j 

The occurrence probability Pj of the fault Fj in the resulting set is usually replaced 
by its relative frequency of occurrence. This is a valid procedure for larger data sets. 
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Fig. 16.7. Decision tree for the distinction of two classes F\ and F 2 : (a) decision tree; (b) 
resulting partitioning in a sj — s 2 — plane. The symptoms .?i and s 2 are continuous variables 


One can describe the tree growing as making the decisions that extract most infor¬ 
mation from the data sample. A finally pure leaf of the tree contains no information. 

Figure 16.8 shows the behavior of the entropy index function. Displayed is the 
case of two classes. The abscissa is the probability of the first of the classes, P\. P 2 
is then directly given by 1 — P\. Figure 16.8 also displays a second function, the 
Gini-index, that is sometimes used instead of (16.16), being defined as: 

igimiP) = '~J2 P j (16.17) 

j 

This impurity measure is calculated faster because is does not rely on the evaluation 
of a logarithm. Its behavior is very similar to (16.16) as can be seen in Figure 16.8. 
Both functions are symmetric with regard to Pi = 0.5, which is clear since the 
behavior must not depend on the naming of the data, and both functions vanish for 
pure sets (i.e. P \ j2 = 0 or P\, 2 = 1). The maximum impurity is reached for P\ = 
P 2 = 0.5. 

The decision tree is constructed by choosing the decisions that minimize the 
entropy of the sets on the next tree level. The algorithms are optimal in a single 
step, but do not necessarily lead to an overall optimal tree, that is, one that is of 
minimum size. A search for the overall optimal decision tree is not feasible with 
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Fig. 16.8. Entropy measure function and Gini-index for a simple two-class problem. 


normal computational power. The algorithms are augmented with various growing 
and pruning approaches. They weigh the complexity of the resulting structure against 
its classification performance to yield sensible compromises without overfitting. 

Originally, decision trees were developed for decisions based on non-continuously 
valued data. The problem with continuous variables is the theoretically infinite num¬ 
ber of decisions that can be based upon them. One possibility is to discretize the 
variables by means of interval techniques. It is reported that the necessary computa¬ 
tional time is not a principle problem. 

The more serious problem of the approach is the fact that it is only optimal in 
one step. In certain data configurations it can happen that a reduction of the index 
i (P) through a subdivision is not possible which normally terminates the algorithm. 
It can, however, be beneficial to carry through the split and further subdivide the 
resulting sets. The average entropy of the resulting sets in the second level can be 
much lower than original set although the intermediate split does not decrease the 
entropy measure. 

The decisions of standard trees are uni-variate resulting in axes-orthogonal splits 
in the input space. While it is possible to approximate any decision boundary with 
axes-orthogonal segments, this is not always desirable because it requires a large 
tree structure with many decisions. This explains why the methods work best with 
relatively weakly correlated inputs. There are methods utilizing multi-variate splits, 
but they are increasingly difficult to handle since the number of possibilities increases 
drastically. 

Decision trees are common in medical diagnosis applications. It seems that the 
intuitive, simple scheme that can be trusted and understood is important for this area 
of application. 
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16.6 Neural Networks for fault diagnosis 

In the following, the two most important neural network topologies will briefly be 
reviewed. These are multi-layer perceptron networks (MLP) and networks composed 
of radial basis functions (RBF). These networks differ from the aforementioned di¬ 
agnosis approaches as they represent simple static function approximations, not re¬ 
stricted to any special kind of algebraic function. 

While the Bayes Classifier assumes a special function (Gaussian) to be parame¬ 
terized, the neural networks are designed to match an arbitrary function by reducing 
an appropriate error measure V, usually defined as a sum-of-squares of the errors 
similar to (16.11). The mapping function of the network depends on a number of 
weights w that contain the information of the network. Network training is done by 
adapting the weights to minimize V, see also Section 9.3.3. Neural networks of the 
aforementioned types have been shown to be universal approximators, meaning they 
can fit any function to an arbitrary accuracy, provided their structure is sufficiently 
large. As diagnosis tools they are trained with exactly the same target values as for 
instance the polynomial classifier which means that they also approximate the class- 
conditional posterior probability. 

An interesting result of the sum-of-squares error function has been pointed out in 
[16.3]: The trained network mapping is given by the conditional average of the target 
data: 

fj( s;w*) = E {i tj/s } = J tjp(tj\s)dtj (16.18) 

with w* representing the weights at the minimum of the error function. p(tj |s) de¬ 
notes the probability of occurrence of the target value tj given s as the symptom 
values. E {...} denotes the expectation value. 

If the network is trained with as many outputs as there are faults to be distin¬ 
guished and the target values are again chosen as binary values according to (16.12), 
this expression transforms to: 

fj( s: w*) = J tj S{tj - Sji)P(Fj |s)J dtj (16.19) 

with 8ji being the Kronecker delta. P ( Fj |s) is the probability that s belongs to fault 
Fj. Since the delta function is zero everywhere else apart from 0, the integral can 
easily be computed to yield: 


/ y .(s;w*) = P(Fj\s) (16.20) 

meaning that the outputs of the network correspond to the Bayesian posterior prob¬ 
abilities of the fault classes given the symptom pattern s. This suggests that the net¬ 
work structures trained with a labelled set of reference data and sum-of-squares error 
functions have the potential to learn the statistically optimal decisions given by the 
intersections of the posterior probabilities while not being constrained to an assump¬ 
tion about the probability density functions. 
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It has to be stressed, however, that this result presupposes that the optimal net¬ 
work weights w* at the minimum of V(\\) have been found. This requires not only 
an appropriately sized network to be able to fit the probability function, but also 
the solution of problems like overfitting and decent convergence of the optimization 
algorithm made use of. 

In the field of fault diagnosis, neural networks are frequently employed: In a 
survey about half of the publications utilizing a classification procedure for fault di¬ 
agnosis, relied on neural networks, [16.10]. Since efficient tools for network training 
and implementation have become easily available, it is likely that neural networks are 
used in more than half of the applications today. They provide a means to achieve 
decent classification results with relatively moderate design effort. 

16.6.1 Multi-layer perceptron networks 

The structure of a multi-layer perceptron network is shown in Figure 16.9. It uses 
each of the symptoms as one input and calculates one output per fault. Since the out¬ 
put represents a probability, its domain should be the interval {0; 1}. That can easily 
be ensured by utilizing a sigmoidal activation function in the output layer of the net¬ 
work. That is unusual for most other applications of MLP networks that commonly 
require an unlimited continuous output and hence employ a linear function in the last 
network layer as treated in Section 9.3.3 for process identification. 



Fig. 16.9. Multi-layer perceptron network for fault diagnosis. The typical configuration has 
symptoms Sj as inputs and one output fj per fault class and sigmoidal activation functions in 
the output layer. 


The network layer first performs a projection of the data by a weighted sum of 
the input values and a bias. If one denotes the output of the first neuron in the first 
hidden layer with z\ \ it is calculated as: 


-li = y(a n) 


(16.21) 


with y(-) being the activation function of the network. Its input a\ \ is given by: 
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n s 

x n =^2 = w \) ]s i + w oJ ( 16 . 22 ) 

i=i 

This projection is similarly performed in each of the layers of the network. It makes 
the MLP a good choice for high-dimensional problems since a small number of hid¬ 
den neurons effectively removes irrelevant information from the inputs. On the other 
hand, a configuration with a high input dimensionality leads to large numbers of free 
parameters which is problematic if the reference data base is not sufficiently large. 

The MLP should be used in cases where the shapes of the class boundary are 
complex and a large number of data points is available. Since it must be trained with 
a nonlinear optimization technique, its training time can become relatively long. With 
the availability of modern and highly efficient training algorithms, however, this is 
not a serious problem any more. Even networks with multiple hidden layers can be 
trained relatively fast. Examples of MLP networks for the discrimination of differ¬ 
ent fault situations can be found [16.1], [16.11], [16.19], [16.13], [16.12], [16.23], 
[16.24] and [16.26], In [16.25], a set of neural networks is used with each network 
trained to distinguish only one of all possible faults from the nominal situation. The 
advantages of this approach are smaller network sizes with simpler functions to be 
learned. The overall number of parameters, however, is most likely larger than in a 
single network configuration. 

In some applications, [16.2], the MLP network is not only utilized for the diag¬ 
nosis but at the same time for generating symptoms from the measurement signals 
directly. In this case, the typical fault detection/fault diagnosis nomenclature is not 
applicable. The task of the neural network is then a direct mapping from measure¬ 
ments to faults. This concept might be appealing at first sight due to its simplicity; 
it is however feasible only for simple static systems. Furthermore, the often highly 
specific domain knowledge of the system designer is completely lost. 

A large problem of the MLP networks is the difficult extrapolation behavior, as 
the data sets in diagnostic applications are not always complete and also relatively 
sparse compared to other classification areas like image processing. This creates a 
need for a diagnosis system working outside the trained symptom domain. This is 
especially problematic for MLP networks. The neurons use global activation func¬ 
tions that contribute significant shares of the network output over the whole input 
domain. If the network is trained with many degrees of freedom to fit a complex 
decision boundary, these neurons will create completely unpredictable outputs in the 
extrapolation region, even close to the training area. This problem was first formu¬ 
lated in [16.11] and it was suggested to use local approximating networks like the 
radial-basis function networks instead. The problem will occur in particular if a high¬ 
dimensional input space is sparsely populated, leaving vast areas without training 
data. 

16.6.2 Radial-basis function networks 

The second most widely used networks for diagnosis are radial-basis function net¬ 
works. Most frequently, non-normalized basis functions with a singleton in the out- 
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put layer are used, compare Figure 16.10. The advantage of the configuration is the 
simple optimization of the output layer weight that are linear in the output error, 
hence a deterministic single-step optimization can be used as described in Section 
9.3.3. 

Consequently, the research has concentrated on the more difficult problem of 
placing the basis functions. The standard procedure is to cluster the input data and 
locate basis functions at the cluster centers. Typical approaches are based on A:-means 
clustering or Kohonen feature maps to group the input data. 



Both, clustering as well as Kohonen feature maps are based on the minimiza¬ 
tion of a distance from the data points to reference vectors. An iterative procedure 
creates the reference vectors that represent a set of data points. The approach does 
usually not use information about the class membership of the data points and is thus 
called unsupervised. Hence it can happen that overlapping data from different classes 
are assigned to a single basis function. The network will then hardly be capable of 
differentiating the classes by adapting only the output layer weights. 

This problem has lead to the development of various constructive algorithms that 
successively create new basis functions. They are typically driven by the performance 
of the network and refine the output by adding new locations for basis functions. Ex¬ 
amples are the approach developed by [16.6], or the LOLIMOT-algorithm developed 
by [16.14], An interesting approach is connected to the decision trees discussed in 
Section 16.5. By first learning a division of the input domain via a uni-variate de¬ 
cision tree algorithm, one derives a useful segmentation of the input space. Placing 
the basis functions one per segment is reported to yield a network with a superior 
classification performance compared to the decision tree as classifier alone. 

The alternative to constructive algorithms is a procedure that first creates too 
many basis functions and then uses a sensible way of removing unnecessary ones. 
One of those approaches is known as global ridge regression and is realized with an 
additional regularization term in the least-squares error function (16.11): 
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N m 2 

v 7 = ^2(tj(s(k)) - fj(s(k))) 2 + a ( w f ] ) (16.23) 

k =1 i =1 

[21 

The w\ are the m output layer weights of the radial basis function network, a a con¬ 
stant controlling the strength of the regularization. The fj are again the network out¬ 
puts that can be between zero and one. The desired network outputs are the tj (s (k)). 
They are known fault indicators having either a value of 0 or 1. In the context of 
neural networks, the term target value is common for this desired network output. 

Since the optimization tries to minimize V* , weights that are not relevant for the 
performance of the network will be driven to zero. In a second step, the correspond¬ 
ing neurons can be pruned from the network yielding a compact, yet well performing 
network. The procedure should prune only one neuron at a time and repeat the opti¬ 
mization. 

One major difference between MLP and RBF networks can be seen in the extrap¬ 
olation behavior of RBF networks. Since each basis function influences the output 
of the data only in a limited region of the input space, RBF networks are said to be 
locally approximating. For diagnosis applications, that can be favorable: Observed 
data points outside the coverage of the training data do not significantly activate any 
basis function. Consequently, the outputs assume a value close to zero. This can be 
interpreted as a rejection, i.e. the outputs signal that a new situation has occurred and 
no definite conclusion can be drawn. This does not give a completely satisfactory 
answer but avoids at least a wrong diagnosis. 

It can also serve for an on-line training of the network. In [16.4] it is suggested to 
augment the network with basis functions if a data point occurs that does not belong 
closely enough to any of the priori known fault classes, i.e. basis functions. This 
property is calculated by an appropriate measure to evaluate the distance of the point 
from known input areas. Similar concepts are common for constructive algorithms 
to locate the basis functions of radial basis function networks. 

The local approximation of the network furthermore allows an interpretation of 
the system to some degree: The second layer weights w- belong to a well-defined 
area of the input domain. Taking result (16.19) into account explains the weights as 
a somewhat smoothed estimate of the corresponding posterior probabilities p(Fj |s). 
Examples of radial-basis function networks for diagnosis are the fault diagnosis of 
industrial plants, [16.17], chemical reactors, [16.20], [16.22] and a two-link manip¬ 
ulator, [16.21], 

Radial-basis function networks with an intelligent basis function placement strat¬ 
egy are suitable for problems with many reference data points because the optimiza¬ 
tion of the weights in the last layer can be performed very efficiently. They are gen¬ 
erally less critical due to the local optimization behavior, but require more storage 
for the network parameters than a comparably performing MLP network. 

16.6.3 Clustering and self-organizing networks 

Occasionally, clustering approaches or the closely related Self-Organizing-Maps are 
used for the diagnostic classifier design, [16.5]. These methods are un-supervised 
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and rely on the assumption that training data points that are near to each other in 
the symptom space will also belong to the same fault class. They find clusters in 
the data by evaluating neighborhood measures and employing competitive strategies. 
The next step is the determination of the majority of the class membership of the data 
points for each cluster. This gives labels for the clusters. When a data point is to be 
classified, one determines which cluster it belongs to and assigns the corresponding 
label. 

This “clustering-labelling” approach, however, is not as powerful as the super¬ 
vised neural network methods described above. Especially difficult decision bound¬ 
aries with overlapping classes are hard to distinguish since the method does not use 
the class information to build the network or adapt any weights. The class informa¬ 
tion is only employed during the labelling. As initialization procedures for radial 
basis function networks or to determine initial membership functions in fuzzy logic 
based systems these clustering methods are nevertheless very useful, see Section 
17.2. 

An overview of other neural network techniques for fault diagnosis and some 
practical considerations for their use can be found in [16.18] and [16.19]. 


16.7 Problems 

1) Compare the a priori assumptions of different classification methods, like Bayes, 
geometric and polynomial classifiers. Use as example Figure 16.5. 

2) What are the differences between geometric classifiers and classifiers based on 
decision trees? 

3) Specify the input and output variables for the classification with neural networks. 

4) Which classification methods should be used if only few symptoms data (5) are 
available for each fault? 
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Fault diagnosis with inference methods 


For some technical processes, the basic relationships between faults and symptoms 
are at least partially known in form of causal relations: fault —* events —> symptoms. 

Figure 17.1a shows a corresponding causal network, with the nodes as states 
and edges as relations. The establishment of these causalities follows the fault-tree 
analysis (FTA), proceeding from faults through intermediate events to symptoms (the 
physical causalities) or the event-tree analysis (ETA), proceeding from the symptoms 
to the faults (the diagnostic forward-chaining causalities), see, e.g. [17.31] and Sec¬ 
tion 4.2. To perform a diagnosis, this qualitative knowledge can now be expressed in 
form of rules 

IF ( condition ) THEN ( conclusion ) 

The condition part (premise) contains facts in the form of symptoms .v,- as inputs, and 
the conclusion part includes events e/ ( and faults fj as a logical cause of the facts. 
If several symptoms indicate an event or fault, the facts are associated by AND and 
OR connectives, leading to rules in the form 

IF (.V! AND 52 > then ( ei ) 

IF ( e\ OR e 2 ) THEN ( f x ) 

compare Figure 17.1b. 

For the establishment of this heuristic knowledge several approaches exist, see 
[17.18], [17.58]. 



Fig. 17.1. Fault diagnosis using inference methods: (a) causal network; (b) fault symptom tree 
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17.1 Fault trees 

Fault trees are established as a graphical tool for the visualization of the relation¬ 
ships in reliability and diagnosis, [17.50], representing binary relationships. They 
are directed graphs showing the fault situation at the top and symptoms and condi¬ 
tions below. Elements of the tree are logic connections, events and symptoms. An 
example is shown in Figure 17.2. 



Fig. 17.2. Scheme of a fault tree for binary symptoms (states) [0,1] 


Fault trees are a good and intuitive graphical tool for displaying the binary re¬ 
lationships that will lead to failures. The hierarchical structure supports the human 
comprehension. They are common for analysis and diagnosis in safety-critical ap¬ 
plications. Quantitative failure probabilities can also be derived from the fault trees. 
They require information about the failure probabilities of the individual elements. 
By combining these with the relationships from the fault trees it is possible to cal¬ 
culate the probability of a system failure. The fault tree is usually important during 
the early system design phase in identifying the critical subsystems that contribute 
most to the system failure probability. Examples of fault trees can for instance be 
found in [17.5] and [17.16]. The fault trees shown there for an industrial robot are 
not used for reliability analysis but rather for the diagnosis, i.e. to visualize the di¬ 
agnostic reasoning from symptoms to faults. It is then necessary to design one fault 
tree for each of the n f faults. The leaves of the tree are composed of the n s available 
symptoms. The decision which fault has occurred is ideally a simple binary evalua¬ 
tion of the different trees. In most applications, however, this binary representation 
is not sufficient. It is common that certain symptoms are not clearly recognized or 
that they are uncertain. A straight-forward evaluation of the binary decision would 
not work. A possible strategy is then to use the tree whose activation could most 
easily be achieved by artificially changing the status of the symptoms at the leaves. 
It seems reasonable to assume the fault of the tree whose status is most active from 
checking the symptoms connected to it. Indeed, this strategy is similar to a pattern 
matching: The symptom pattern is assumed that is the closest to a known fault pat¬ 
tern. Viewing the problem from this perspective, however, suggests using a different 
procedure. Instead of trying to laboriously match tree structures, one should translate 
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the problem to a functional one: The symptom occurrences are inputs and the activa¬ 
tion level of the different faults are the outputs of the functional mapping that has to 
be determined. That strategy is followed by the other diagnosis methods presented 
in this chapter. It represents the standard procedure of any inference method. 

As discussed the symptoms and events are considered as binary variables, and 
the condition part of the rules can be calculated by Boolean equations for parallel- 
serial-connection, see, e.g. [17.5], [17.16]. However, this procedure has not proved 
to be successful because of the continuous nature of faults and symptoms. 

The fault detection of discrete event systems on the basis of discrete events only is 
treated, e.g. in [17.57]. The addition of the time of the events improves the situation, 
[17.34]. Model-based methods for fault-detection of timed discrete event systems are 
described in [17.52], [17.56] and [17.53]. An example for the application of the fault 
diagnosis with binary fault trees of a discrete-event system is shown in the following 
example. 

Example 17.1: Discrete-event fault detection for a pump-valve-filter system 

As an example for a discrete-event system a plant consisting of an electrical driven 
pump, electromotor actuated valve, a filter and a fluid storage is considered, see 
Figure 17.3. Measurements are the fluid pressure after the pump p \, the end position 
of the valve .V] and the fluid mass flow m i. A sequential controller switches on the 
pump [U\ = 1) and after a few seconds opens the valve ( U 2 =1), independent on 
the pressure measurement p \. It is assumed that the controller functions correctly. 
The corresponding signals are shown in Figure 17.4. 

a) Fault detection with discrete-event amplitude 

The measured outputs are just binary values 1 if in normal operation, obtained by 
limit switches, otherwise 0. Hence, in normal operation as well the inputs as the 
outputs indicate the discrete event 1. 

The task now consists in detecting faults in the components pump, valve, filter 
and the applied sensors. Figure 17.5 shows a fault symptom tree for the case of three 
measurements. If the binary state of these measurements is 111 the plant is correctly 
operating. Based on physical-logic inspection of the binary states of the sensors, a 
fault-symptom table can be established, see Table 17.1. Out of six possible faults all 
faults can be detected, four of them can be isolated, but two are not isolable. The 
number of sensors is now reduced to two. Table 17.2 shows the results for different 
sensor combinations. Then five faults are possible. If the flow sensor m\ is included, 
all five faults are detectable, but only two are isolable and three cannot be isolated. 
If the flow sensor m\ (which measures the main output) is not used, but only the 
sensors p\ and s 1 , the filter fault F5 is not detectable, and four faults are detectable, 
but not isolable. Using only the flow sensor, Table 17.3, allows to detect four out of 
four possible faults, but they cannot be isolated. Table 17.4 summarizes the results 
for different measurements. 



m 


Fig. 17.3. Pump-valve-filter-storage plant 



control 
inputs /ji 


Fig. 17.4. Time history of control inputs and measurable outputs 
Table 17.1. Fault symptom table for discrete events and three measurements, and m 1 



measurable 

outputs 


fault 

symptoms 
P\ kl \m\ 

isolable 

not isolable 

no fault 

1 

1 

1 

- 

- 

FI 

0 

1 

0 

X 


F2 

0 

1 

1 

X 


F3 

1 

0 

0 

X 


F4 

1 

0 

1 

X 


F5 

1 

1 

0 


X 

F6 

1 

1 

0 


X 
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FI F2 F3 F4 F5 F6 



sequential controller 


Fig. 17.5. Fault symptom tree with binary events for the discrete-event system Figure 17.3 and 
17.4 and the case of three limit switch measurements for pressure, valve and flow 


Table 17.2. Fault symptom table for two measurements, p l and rii\, S) and ih \, p\ and .sy 


fault 

symptoms 

isolable 

not 

isolable 

fault 

symptoms 

isolable 

not 

isolable 

Pi 

m i 



m i 


no fault 

1 

i 

- 

- 

no fault 

1 

i 

- 

- 

FI 

0 

0 

X 


FI 

T 

0 


X 

F2 

0 

1 

X 


F2 


- 

- 

- 

F3 

1 

0 


X 

F3 

0 

0 

X 


F4 

- 

- 

- 

- 

F4 

0 

1 

X 


F5 

1 

0 


X 

F5 

1 

0 


X 

F6 

1 

0 


X 

F6 

T 

0 


X 


fault 

symptoms 

isolable 

not 

isolable 

not 

detectable 

Pi 

Sl 


no fault 

1 

1 



X 

FI 

0 

1 


X 


F2 

0 

1 


X 


F3 

1 

0 


X 


F4 

1 

0 


X 


F5 

1 

1 



X 

F6 

- 

- 

- 

- 

- 
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Table 17.3. Fault symptom table for one measurement, the mass flow m\ 


fault 

symptoms 

m 1 

isolable 

not 

isolable 

not 

detectable 

no fault 

1 

- 

- 

- 

FI 

0 


X 


F2 

1 

- 

- 

- 

F3 

0 


X 


F4 

1 

- 

- 

- 

F5 

0 


X 


F6 

0 


X 



Table 17.4. Comparison of detectable faults for different measurements 


sensors 

Pi H m 1 

Pi «t 

Si M\ 

Pl Sl 

Ml 

possible faults 

6 

5 

5 

5 

4 

detectable faults 

6 

5 

5 

4 

4 

not detectable faults 

0 

0 

0 

1 

0 

isolable faults 

4 

2 

2 

0 

0 

not isolable faults 

2 

3 

3 

4 

4 


Hence, it is important that the main output signal of the pump system, the mass 
flow m i (or at least pressure pi ) is available as measurement. Then with all sensor 
configurations, either one, two or three sensors, all possible faults are detectable. 
However, with one sensor they cannot be isolated, i.e. cannot be diagnosed. But with 
increasing number of sensors more faults can be isolated, but not all of them. 

b) Fault detection with discrete-event time intervals 

A further information source for fault detection are the time intervals between reach¬ 
ing the discrete events. By measuring the time intervals between the time instants 
indicated in Figure 17.4 the fault symptom table shown in Table 17.5 results. If, for 
example, the pump does not deliver fluid or has too slow dynamic response after 
the switch command U\ = 1 the measured time interval ti — h is not within a pre¬ 
specified threshold and therefore generates a 0. The Table indicates that with four 
time intervals five faults can be isolated and two not, as in Table 17.1. This includes 
a CPU-program fault for not switching U2 at the programmed time instant. 

If the time intervals for nominal behavior is assumed to be very large, then the 
same information is gained as with the discrete events of the measured signals, but re¬ 
lated to each other. Therefore the information which is additionally obtained with the 
time intervals depends on the time behavior or dynamics of the components which is 
not considered by observing the discrete event amplitudes only. Hence, faults which 
influence the dynamics of the components are additionally included by the time in¬ 
terval observations. 

Combining the time interval fault detection of Table 17.5 with the discrete event 
amplitude fault detection by using three sensors as of Table 17.1 allows to isolate also 
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too slow dynamics of four components. Combination with the case of two sensors or 
one sensor improves the isolability of faults, especially for measurement with p\, .s i 
and m \ . Hence, this example shows that symptom trees with binary events are well 
suitable for discrete-event systems. 


Table 17.5. Fault symptom table for measured time intervals between discrete events 


fault 


symptoms 

isolable 

not 

isolable 

h - h 

?2 - ?1 

U - h 

? 5 - h 

no fault 


1 

1 

1 

1 



FI 

pump 

1 

0 

1 

0 

X 


F2 

pressure sensor 

1 

0 

1 

1 

X 


F3 

valve 

1 

1 

0 

0 

X 


F4 

valve sensor 

1 

1 

0 

1 

X 


F5 

filter 

1 

1 

1 

0 


X 

F6 

flow sensor 

1 

1 

1 

0 


X 

F7 

CPU-program for U 2 

0 

1 

0 

0 

X 



□ 


17.2 Approximate reasoning 

For the rule-based fault diagnosis of continuous processes with continuous variable 
symptoms, methods of approximate reasoning are more appropriate than binary de¬ 
cisions. Based on the available heuristic knowledge in the form of heuristic process 
models and weighting of effects, different diagnostic forward and backward reason¬ 
ing strategies can be applied. Finally the diagnostic goal is achieved by a fault de¬ 
cision which specifies the type, size and location of the fault as well as its time of 
detection, [17.25], [17.26]. 

A review of developments in reasoning oriented approaches for diagnosis was 
given in [17.27]. As major areas of interest, medicine, [17.54], [17.49], and engi¬ 
neering, [17.14], can be observed. Engineering research especially with regard to 
reliability analysis of nuclear power stations, aero space systems and electrical equip¬ 
ment started much earlier and followed in many cases the concept of fault tree analy¬ 
sis, [17.12], [17.24], [17.11], [17.10], [17.5], [17.23], [17.31], [17.15], [17.9]. On 
the other side, artificial intelligence (AI) offered new methods for the treatment of 
cause-effect relations, [17.43], and for diagnostic problem solving, [17.36], [17.44], 
[17.45], [17.58]. The development in the area of artificial intelligence (AI) was ori¬ 
ented initially to medical diagnosis and then extended to technical processes. There¬ 
fore, also for technical diagnosis only heuristic symptoms were considered. Then 
more sophisticated diagnostic reasoning strategies were developed by increasing the 
level of abstraction see, e.g. [17.47] and [17.30] using the causalities as deep logic in¬ 
terdependencies. The fault-symptom trees known from engineering and the AI strate¬ 
gies for treating causalities can be brought together for fault diagnosis. Especially 
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the analytical symptoms generated by the model-based fault detection then allow to 
perform a deep fault diagnosis by pinpointing on the possible physical fault origins 
(roots). 

17.2.1 Forward chaining 

By using the strategy of forward chaining a rule, the facts are matched with the 
premise and the conclusion is drawn based on the logical consequence (Modus po- 
nens). Therefore with the symptoms Sj as inputs the possible faults Fj are determined 
using the heuristic causalities. 

In general the symptoms have to be considered as uncertain facts. Therefore a 
representation of all observed symptoms as confidence functions c(sj) or member¬ 
ship functions /x(,q) of fuzzy sets in the interval [0,1] is feasible, especially in unified 
form as described in Section 15.1. For a short introduction into fuzzy logic see Chap¬ 
ter 23.3. 

a) Approximate reasoning with fuzzy logic 

With the structure of a fault-symptom tree, obtained by knowledge acquisition via 
the event-tree analysis (ETA) of the process, a fuzzy rule based system with multiple 
levels of rules can be established, as shown in Figure 17.6. 



Fig. 17.6. Signal flow in a fuzzy-rule-based system for fault diagnosis with two levels of rules 
and max/min operations and singletons as inputs 


The symptoms Sj are now represented by fuzzy sets sf . with linguistic 

meanings like "normal”, “less increased”, “much increased”, etc. The general pro¬ 
cedure within the fuzzy IF-THEN rule based system, [17.35], [17.64], [17.13], then 
follows: 
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• inference: The fuzzy IF-THEN rule expresses a fuzzy implication relation be¬ 
tween the fuzzy sets of the premise and the fuzzy sets of the conclusion. If the 
rule is interpreted by the Mamdani-implication and the compositional rule of in¬ 
ference by [ 17.63] is applied, following steps can be distinguished: 

- matching of the facts with the rule premises (determination of the degrees of 
fulfillment of the facts to the rule premises —> fit values); 

- if the rule contains logic connections between several premises by fuzzy 
AND or fuzzy OR the evaluation is performed by f-norms or ?-conorms 
(The result gives then the degree of fulfillment of the facts to the complete 
premise); 

- the evaluation of the resulting conclusion follows the max-min composition, 
[17.63] (the degree of fulfillment of the premise restricts the fuzzy set of the 
conclusion) 

• accumulation of all conclusion fuzzy sets if several rules contribute to the output: 
A union operation yields a global fuzzy set of the output; 

• defuzzification to obtain crisp outputs: various defuzzification methods can be 
used, as, e.g. center of gravity, maximum-height, and mean of maximum, to ob¬ 
tain a crisp numerical output value. For a fault diagnosis, however, the defuzzi¬ 
fication is often replaced by a simple maximum operation. For the conditional 
part of the IF-THEN rules relatively simple fuzzy-logic operations are obtained 
by max-min composition to obtain the most possible fault. 


Fuzzy - AND : /r(rj) = min[/r(£i ),..., /r(£ v )] (17.1) 

Fuzzy - OR : = max[/i(£,),- n(fi v )\ (17.2) 

These operations agree with corresponding operations of Boolean logic. However, 
some information is not used as only the minimal or maximal value is taken. Another 
possibility is to use the prod-sum-operation 

Fuzzy - AND : n(jj) = /*(£i) /a(£ 2 ) • • ■ (17.3) 

V 

Fuzzy - OR : n(rj) = 1 - ]~[(1 - /i(&)) (17.4) 

1 = 1 

In this case all values are represented in the result. Note also the similarity to equa¬ 
tions of probability theory. 

The NOT-Operation is in both cases 

NOT : jtfa) = 1 - n(g) (17.5) 

The dimensions of the fuzzy rule based system is given by 

• number of symptoms i = 1,2 ,..., n s \ 

• number of levels / = 1,2 ,...,/’; 

• number of rules per level v = 1,2,...,7V; 
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• number of faults j = 1,2 

The overall dimension may therefore blow up strongly even for small compo¬ 
nents or processes. Therefore the software implementation is important. Mainly two 
procedures to perform the reasoning are known: 

• Sequential rule activation from level / to level / + 1 (horizontal procedure); 

- matching of symptoms s',. with the premises of rules of level / = 1 (including 
their logical operations) to determine the events e', ; 

- matching of events e' lv , l = 2,3. P. with the premises of rules of level / 

to determine the events eb +1 , v 

- matching of events eq+\) v with the fault’s fuzzy subsets fj; 

- accumulation of fault fuzzy subsets to obtain the global possibility for each 
fault 

- fault decision by maximal possibility 

- defuzzification to obtain a crisp size of the most possible fault. 

• Multiple rule activation from all symptoms to all faults (vertical procedure), by 
using the fuzzy relational equation 


F = S°R 


(17.6) 


proposed by [17.61][17.60]. 

- chaining su —» e\ v -> ... <?/ v —> ■ ■ ■ fj by multiple composition to obtain 
overall relations R s u X fj i n form of a matrix R; 

- matching of R with all current symptoms S' by using the fuzzy relational 
equation F = S o R; 

- accumulation to obtain the global possibility for each fault; 

- fault decision by maximal possibility; 

- defuzzification to obtain a crisp size of the most possible fault. 

In both approaches a defuzzification can be performed after each level or not. 

A similar approach using fuzzy relation equations was proposed earlier by 
[17.29] using derivations of directly measurable variables as symptoms and stating 
the fuzzy relation equation 

F = S o R* (17.7) 

The knowledge for the establishment of this equation stems from a fault-tree analy¬ 
sis (FTA). Then the inverse problem of (17.6) has to be solved, by calculating the 
possibilities of the faults F, e.g. by using Sanchez’s operator, [17.48]. [17.21] use 
the relational fuzzy equation (17.6), for a diagnosis after a failure has happened 
(post mortem diagnosis) and searching for more or less binary events (symptoms) 
as causes. Also in this case the inverse problem is solved. 

Indeed, in many diagnosis application cases the knowledge acquisition first fol¬ 
lows a fault-tree-analysis (FTA), by proceeding from assumed faults through events 
to symptoms by physical causality inspection according to Figure 15.2b. If the 
knowledge is complete, a graphical representation like Figure 17.6 can be obtained 
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directly, from which (17.6) can be established. By this way the solution of the in¬ 
verse problem can be circumvented. Then, also basic fuzzy-logic software tools can 
be used directly. 

To reduce the computational effort, simplifying assumptions may help. Follow¬ 
ing simplifications are possible: 

• singletons for the symptoms: matching is reduced to direct fuzzification of crisp 
values; 

• singletons for the events and faults: the conclusions are singletons with a height 
of the matching degree between facts and rule premise; 

• universal fuzzy sets for the events and faults: the conclusion is a universal set 
with height of the matching degree between facts and rule premise. But then the 
size of the event or fault cannot be determined. 

In the cases of singletons or universal sets for events and faults the conclusions 
are identical with the membership degree of the facts with the logical operation in 
the premise. This means the whole procedure reduces to an aggregation of the facts 
with appropriate /-norms and /-conorms, and chaining of the results, [17.25]. Fuzzy 
fault diagnosis can be expanded to time dependent faults and symptoms, including 
incipient and intermittent faults and multiple faults, [17.60]. 

b) Simplified fuzzy logic reasoning for fault diagnosis 

The basic fuzzy logic operations for given continuous variable symptoms are sum¬ 
marized in Chapter 23.4. For typical fault-diagnosis applications, the standard fuzzy 
system can be reduced. The main reason for that is the desired output of the diag¬ 
nostic. Instead of an arbitrary fuzzy set of a continuous variable, the output is a fault 
measure representing a gradual measure for the possibility of the corresponding fault. 
If the observed symptoms are far apart from the linguistically defined pattern, this 
fault measure will be close to zero, whereas a perfect match will yield a fault mea¬ 
sure of one. This means that the higher the fault measure becomes, the more likely 
the corresponding fault situation has occurred. The possibility of that event increases 
with the fault measure. The reduction of the rule consequences to a statement which 
fault has occurred can be represented by a singleton value which is scaled by the 
rule fulfillment. Therefore, no other output membership functions are necessary and 
also the defuzzification is not required. The resulting, simplified fuzzy logic system 
structure can be seen in Figure 17.7. 

c) Probabilistic fault-symptom reasoning 

Another possibility to cope with uncertain facts is to assign probabilities P(fj) to 
the symptoms (events) and P(r]ic) to the events (faults). Based on the causal tree 
a simplified Bayesian network can be assigned, including conditional probabilities 
P(£l,1k)A 17-43], 
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IF 5, small AND s 2 medium THEN Fault/ 



IF s 3 medium AND s 4 small THEN Fault/ 


Fig. 17.7. S implified fuzzy logic system for fault diagnosis by using singletons as fault measures 
in the range 0... 1 


With the assumption that the symptoms £,• are statistically independent among 
each other and the symptoms are statistically independent of the events r], it holds 
for an AND connection, [17.60] 

P(r/,$ i AND &) = Pfa AND £ 2 |ij) P(r,) 

= P(g i) P(h) POf) (17.8) 

Similarly one obtains for an OR-connection 

Pfo, £iOR| 2 )= . |7Q , 

F’Gi) + P(&) ~ P&) p&)] P(ji) u 

If in addition the events are assumed to have happened surely, P(rj) = 1, one obtains 
for the AND-connection 


P(ti ,%i AND £2 AND... £ w ) = 

P(lh) P&)... P&) 


(17.10) 


and the OR-connection 


P(ri,$ t OR £2 OR.. ,£v) = 1 - F[ (! £ A?0) (17.11) 

/ = 1 

The similarity of the formulas to fuzzy-logic operations with the prod-sum op¬ 
eration, e.g. (17.3), (17.4), is obvious. However, the assumptions on statistical in¬ 
dependence do not take into account the existing causalities of the fault-symptom 
trees. 

17.2.2 Backward chaining 

The strategy of backward chaining assumes the conclusion as known and searches for 
all relevant premises (modus tollens). This is especially of interest if the symptoms 
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are not complete. Therefore the concluded events and faults are displayed to the oper¬ 
ator after forward chaining with all known symptoms. A refinement of the diagnosis 
can then be achieved by selecting the most plausible events and faults as hypothe¬ 
sis and applying backward chaining by asking for missing symptoms. Then forward 
chaining is restarted. This procedure is implemented best within an interactive dia¬ 
logue and repeated until terminated by the operator, [17.15], [17.16]. Other chaining 
strategies are Establish-Refine, Hypothesize and Test, Depth and Width searching, 
etc., see [17.22]. If all symptoms are already taken into account during forward rea¬ 
soning, backward reasoning can be applied for the validation of diagnosed faults, 
[17.61]. The fault diagnosis considered until now assumed that all symptoms appear 
simultaneously and do not change with time. Further cases are therefore dynami¬ 
cally developing faults and symptoms, (e.g. either incipient or intermittent faults) 
and therefore dynamic fault trees, [17.41], [17.60] and also multiple faults. 

17.2.3 Summary and comparison 

The last two chapters have introduced some important approaches for fault diagnosis 
ranging from fault tree analysis to neural networks and approximate reasoning with 
fuzzy logic. Each method has its advantages and disadvantages. A summary of the 
characteristics of the methods is given in Table 17.6, [17.19]. 


Table 17.6. Comparison of the different fault diagnosis methods and their main characteristics. 
The performance evaluation is based on experience and assumes a typical fault-diagnosis prob¬ 
lem with some explicit knowledge and a more or less complete training data set of medium 
complexity. Naturally, there are always examples for that one or the other method performs 

better or worse. Therefore, the rating here is preliminary. + + /-: strongly pos./neg., +/—: 

pos./neg., o: average; expl.: explicit knowledge base, impl.: implicit knowledge base. 


Method 

Classification 

Inference 


Bayes 

Nearest 

Neighb. 

Polyno¬ 

mial 

Decis. 

Tree 

MLP- 

NN 

RBF- 

NN 

Cluster¬ 

ing 

Fault 

Tree 

Fuzzy 

Logic 

Know¬ 

ledge 

Base 

impl. 

impl. 

impl. 

impl. 

impl. 

impl. 

impl. 

expl. 

expl. 

Design 

Effort 

+ 

+ + 

+ 


0 

o 

+ 



Trans¬ 

parency 

— 

— 

— 

o 

— 



+ + 

+ + 

Perfor¬ 

mance 

— 

+ 

+ 

o 

+ 

+ 



+ 


It is apparent that two different sources of information can be used: Expert or 
domain (structured) knowledge on the one hand and measured data from fault ex¬ 
periments on the other. Each of the methods presented make use of one of the two. 
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This leads either to highly transparent systems that are tedious to design or too non¬ 
transparent data-based classifiers. It becomes clear that in reality a mixture of both 
sources of information resides. To assess them in parallel having benefit from ref¬ 
erence data but also incorporate prior knowledge would be an improvement. The 
combinations of neural and fuzzy systems is such an approach and will be treated in 
the next section. 


17.3 Hybrid neuro-fuzzy systems 1 

Combinations of neural network learning strategies with fuzzy logic elements have 
been discussed for about 15 years. The main objective of the approach is to overcome 
the knowledge acquisition bottleneck faced by humans while designing the knowl¬ 
edge base of a traditional fuzzy expert system. The neural network training tech¬ 
niques of neuro-fuzzy systems can handle the information retrieval from data using 
optimization techniques. The fuzzy system representation on the other hand provides 
the intuitive understanding of the resulting system and establishes the possibility of 
integrating expert knowledge. Since the two approaches have a different knowledge 
representation, their combination can be a persuasive way to fuse information from 
different sources, namely human experts and experimental data. Neuro-fuzzy sys¬ 
tems can generate new rules from data or they can refine existing rules by adapting 
parameters within them. 

The main application area of neuro-fuzzy systems is in control and decision sup¬ 
port systems. In control, a strong focus of the neuro-fuzzy research has shifted to¬ 
wards the approximation of nonlinear functions, especially for modelling and system 
identification. In the context of fault diagnosis, however, the ability to build decision 
support systems is more important. 

The following sections assume basic knowledge of fuzzy logic systems. If an 
introduction into the main terms is required, the reader is referred to the Chapter 
23.3. 

17.3.1 Structures 

The combinations of neural networks and fuzzy logic are manifold. One can use 
neural networks as a part of a fuzzy inference system, for instance to map mem¬ 
bership functions, see Figure 17.8. It is also possible to use a neural network to 
represent certain important parameters - such as parameters defining the fuzzy mem¬ 
bership function - of a fuzzy inference system. On the other hand, a fuzzy rule base 
can be employed to specify parameters of a neural network that are difficult to de¬ 
termine, such as structural parameters like the number of hidden neurons. There, the 
fuzzy system automates the heuristics that would be utilized from a human expert 
experienced in designing neural network topologies. 

1 compiled by Dominik Fiissel, [17.19] 
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Figure 17.9 visualizes some of the mentioned concepts. Furthermore, any com¬ 
bination of a serial or parallel set-up of a fuzzy system with a network is possible 
which are called general neuro-fuzzy systems, [17.3]. Here the term is used in a nar¬ 
rower context limited to those systems where the neural network and fuzzy inference 
engine are not completely separate. 


expert 



NN 


Fig. 17.8. Example of neuro-fuzzy system: A neural network (NN) is used for the determination 
of fuzzy membership functions in a fuzzy inference system. 


The most interesting unions of fuzzy systems with neural networks are those with 
a structural equivalence of the two systems. These systems are generally referred to 
as hybrid neuro-fuzzy systems. They consist of a network structure with usually fixed 
number of layers where every layer fulfills a specific function that can be expressed 
as part of a linguistic representation. This means that the hybrid neuro-fuzzy system 
is both at the same time: a fuzzy inference system and a neural network. It has the 
ability to extract fuzzy rules from data by inspecting the result of a neural network- 
type training algorithm. Some examples of the neuro-fuzzy structure will be shown 
in the following sections. 

A relatively complete overview of rule-learning methods with neuro-fuzzy struc¬ 
tures can be found in [17.37], [17.8]. 

a) Generic hybrid neuro-fuzzy model 

The different hybrid neuro-fuzzy systems that can be found in the literature vary in 
their structure and learning algorithms. In the following, a simple, generic hybrid 
neuro-fuzzy model will be outlined to show the main parts of such systems and to 
explain their function. 

A hybrid neuro-fuzzy network is basically composed of the same elements as 
a standard fuzzy inference engine. This is depicted in Figure 17.10. The first task 
is a fuzzification that is performed in a fuzzification layer of the network. This is 
followed by a neural processing unit that consists of one or more layers of neurons 
that constitute the logics of the fuzzy rules. Common elements are for instance units 
computing a t-norm (logic “and”) by a multiplication or minimum operation. 
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Fig. 17.9. Training configurations utilizing a combination of neural and fuzzy elements 



Fig. 17.10. Generic neuro-fuzzy structure for regression and classification (decision support). 


The activations of the rules are then transmitted to the last element, the defuzzifi¬ 
cation. Here, one has to clearly distinguish two situations: Systems used for classifi¬ 
cation and those employed for regression. A regression problem is the approximation 
of a usually continuously valued one- or higher-dimensional function. For the one¬ 
dimensional case, this refers to a function y = /(x), where y assumes any value in 
the continuous output domain. Regression neural network therefore usually possess 
a defuzzification with output membership functions or singletons and network ele¬ 
ments that perform for instance a weighted sum, which in the case of singletons is 
equivalent to a center of gravity defuzzification. 

Classification problems, on the other hand, are mappings from the continuous, 
discrete or mixed input domain to a discrete output domain. To distinguish them 
from the regression problems, in the following a different input/output notation is 
used: Considering the application of fault diagnosis, the inputs are again noted ,y ; - 
(for symptoms) and the outputs fj (fault measures). For the n / class problem, a 
n /-dimensional output domain is created. The indication of the classes is simply 
given by a 0/1 scheme. This means, that the desired output values assume only 0 
or 1. If a training data point belongs to class 3, the values of all but dimension 3 
are zero, whereas the third output equals one. During normal use, the classification 
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system is expected to deliver an equivalent statement. Since the inputs to the net¬ 
work might still be continuous, this can only be realized by some hard threshold or 
decision function. It is common to use a maximum decision acting on the activa¬ 
tions of the rules, i.e. the results of the neural processing units. This maximum filters 
out the strongest rule activation, that in turn becomes the classification statement of 
the system. Hence, the defuzzification of such hybrid neuro-fuzzy systems is only a 
maximum operation. It is common, however, to even omit that maximum operation 
or associate it with a later processing step. This is beneficial, because it is often use¬ 
ful to access the activations of the non-winning rules to make a statement about the 
reliability of the maximum decision. 

Translated into fuzzy logic, rules of regression systems have a structure like: 

R, : IF... THEN y is large (17.12) 

whereas a classification rule looks like: 

R, : IF... THEN class # 1 (17.13) 

For classification rules output fuzzy attributes do not exist. The maximum decision 
will directly act on the activation of the rule premises. The conclusion of the rule 
with the strongest activation will be the classification statement. 

It is important to note, however, that most concepts of neuro-fuzzy regression 
systems easily translate into classification problems. Therefore, the concepts being 
presented in this section apply for both types. 

In the following, some example hybrid neuro-fuzzy approaches will be given to 
clarify the typical structure and elements of such systems. The selection of the ex¬ 
amples is made to show some differing concepts and architectures present in neuro- 
fuzzy systems. 

b) Mamdani and Singleton neuro-fuzzy systems 

A typical example of hybrid neuro-fuzzy systems is the structure suggested by 
[17.32]. It is known as the Fuzzy Adaptive Learning Control/Decision Network 
(FALCON). The network structure maps Mamdani-type fuzzy rules like: 

Rj : IF si is A\ AND sz is Az ... AND s n is A n THEN y ^ is B t t 

(17.14) 

A set of rules like (17.14) using identical conclusions can further be combined 
with a logical OR. Figure 17.11 shows the overall structure of the FALCON network. 

This neuro-fuzzy system has typical properties: The tasks of the network layers 
can be understood linguistically. The network is not equipped with the weights that 
store the information in traditional connectionist models. Here, all links have weights 
with unit value. Only the links leading to the defuzzification layer can also be inter¬ 
preted as having weights if the parameters of the membership functions are taken as 
such. The network is furthermore not fully interconnected as a normal MLP would 
be. 
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Fig. 17.11. Hybrid neuro-fuzzy system FALCON following [17.32] 


The same neuro-fuzzy structure is implemented by [17.46], who apply it for a 
neuro-fuzzy diagnosis. The only structural difference to the method described above 
is the use of the maximum as OR-operator. For the diagnosis of incipient motor 
faults, the use of FALCON is also reported by [17.2]. 

A comparable structure was also proposed by [17.39]. Triangular membership 
functions are used in their Neuro-Fuzzy Classification System (NEFCLASS). The in¬ 
ference employs MinMax-operators. It is, however, alternatively suggested to com¬ 
pute a weighted sum as t -conorm in the last layer. The weights are fixed to one. This 
leads to a simple summation of the rule activations as an OR-operator. 

A different network structure is proposed by [17.3]. Here, the distinction between 
the logic AND and OR is broken up. Instead, the System for Adaptive Rule Aquisition 
with Hebbian Learning (SARAH) uses a so-called ANDOR-neuron model that is 
able to represent both, a / -norm as well as a / -conorm, see Chapter 23.3. It is a para¬ 
metric composite of the bounded difference operator (/-conorm) and the bounded 
sum operator (/-norm). Its feature is a continuous change between a logic AND and 
a logic OR by means of a parameter a. The operator is then approximated by a sig¬ 
moidal function. The resulting neuron resembles a standard sigmoidal perceptron. 

The function of the individual neurons is determined during training. The input is 
fuzzified by Gaussian membership functions in the antecedent layer of the SARAH 
structure. Within the output layer, the rule activation scales the associated output 
singletons. This yields a crisp output value. 

c) Takagi-Sugeno neuro-fuzzy systems 

The second large class of neuro-fuzzy systems that has increased in importance over 
the recent years is given by the networks with a Takagi-Sugeno (TS) structure. The 
difference to the networks described above lies in the consequence of the fuzzy rules. 
Instead of a fuzzy attribute or singleton, the consequence is given as a linear or 
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nonlinear function of the inputs. The correspondence to a rule like (17.14) is then: 

Ri : IF si is Ai AND j 2 is d 2 ... AND s„ s is A„ a THEN fj = fj( s) 

(17.15) 

The most frequent case is the linear function in the consequence: 

Rl : IF si is Ai AND s 2 is A 2 ... AND s„ s is A„ a 
THEN fij = Wo + WiSi + W 2 S2 + ... + Wn s S„ s 

The overall output is then determined as the weighted sum over all consequences. 

A frequently employed learning algorithms is the Adaptive-Network-bcised Fuzzy 
Inference System (ANFIS) proposed by [ 17.28]. The inputs are used twice: They are 
inputs of the fuzzification layer and at the same time inputs of the consequences. 
The membership functions are Gaussian exponential functions and the f-norm is a 
product operator. An important step is the normalization of the membership functions 
performed in the 4th layer. 

A very similar structure is present in the Local Linear Model Tree (LOLIMOT) 
approach presented by [17.40], The only difference to the ANFIS-structure is the 
use of multi-dimensional membership functions whereas ANFIS uses a grid-based 
structure of membership functions with explicit AND-operator. The two approaches 
further differ in their learning algorithms. 

TS-fuzzy systems are usually employed for the approximation of nonlinear func¬ 
tions in general or, more specific for control, in regression problems as part of dy¬ 
namic model identification. As tools for the dynamic modelling they have been used 
for fault detection with multiple models as well as for the generation of nonlinear 
parity equations. Design and applications of nonlinear parity equations from these 
models are described in [17.4], other examples of TS models in fault detection ap¬ 
plications can be found in [17.33] and [17.51]. 

As function approximation schemes, TS systems are not typical classification 
tools. However, they can be used as local polynomial classification systems for fault 
diagnosis, [17.19]. 

17.3.2 Identification of membership functions 

The first step in the training of most neuro-fuzzy systems is to create a set of mem¬ 
bership functions that constitute the fuzzy attributes like “small” or “large”. Many 
different approaches exist to create the fuzzy attributes. To characterize these, one 
has first to distinguish whether the membership functions are uni-variate or defined 
on the complete multi-dimensional input domain. In the first case, one will always 
end up with fuzzy systems based on some kind of grid considering the segmentation 
of the input space that is possible from the combination of the individual member¬ 
ship functions. Multi-dimensional functions offer a more general segmentation of the 
input domain - helpful for the classification performance, but impractical for a good 
understanding of the system. 
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The design of the membership functions can follow three basic principles: 

• Manual construction: The first and simplest choice is to design the shape and 
number of the functions manually. This can be done purely following a grid- 
based approach or, usually preferable, from prior knowledge through visual in¬ 
spection of the data or knowledge of the physical behavior of the input data. In 
the latter case, it is desirable to choose the parameters defining the functions so 
that the different classes can optimally be separated using the selected functions. 

• Concurrent rule and MSF construction: Some of the algorithms design the rule 
base concurrently with the membership functions. This encloses in particular the 
various procedures relying on splits of the input domain. These methods will not 
be examined further in this section. 

• Batch data-based construction: Mainly clustering approaches are used for data- 
based construction algorithms. Different competitive training algorithms are also 
applied. 

Besides these general construction procedures, an error-based refinement of the func¬ 
tions is usually added during training of neuro-fuzzy systems. 

The data-based construction of fuzzy attributes is most frequently implemented 
with a clustering approach. [17.3], [17.42] and [ 17.59] report the use of unsupervised 
input space clustering approaches. They base on the assumption that data points close 
to each other in the symptom space will also belong to the same class and hence can 
be described with a joint attribute or cluster. Once the clusters are found and every 
training data point is assigned to a cluster center, that prototype will form the base of 
the function with the spread of the points specifying the width. 

Most frequently, the Fuzzy-C-Means (FCM), [17.6] algorithm, a fuzzified ver¬ 
sion of the hard c-means algorithm for clustering, is applied. 

17.3.3 Identification of rules with predefined membership functions 

The second step in the training of neuro-fuzzy systems is the identification of rules. 
The problem addressed with neuro-fuzzy systems is that conventional fuzzy mod¬ 
els suffer from combinatorial rule explosion: In other words, the model complexity 
grows exponentially with the input dimension. Most problems though, inherently 
possess a much smaller complexity. Typically, a number of at most 50-100 fuzzy 
rules should suffice for many technical diagnosis applications. Besides the fact that 
the high complexity is usually not required, one should consider that the transparency 
that is the main reason for using neuro-fuzzy approaches, is lost if the system grows 
too complex. This means that highly complex problems are not appropriately tackled 
with neuro-fuzzy approaches and should consequently solved with other algorithms. 

For the case that the membership functions are already determined, one observes 
two main general methodologies for finding the rules: Sequential Backward Elimi¬ 
nation (SBE) and Sequential Forward Selection (SFS). 

a) Sequential backward elimination (SBE) 

The approaches based on Sequential Backward Elimination start with all possible 
rule combinations that can be built with the membership functions defined. Then a 
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selection process eliminates rules that do not apply for the particular problem. The 
approach is restricted to diagnosis problems with not too many symptoms. The rea¬ 
son is simple: Because of the combinatorial rule explosion, it is infeasible to work 
with higher dimensional problems. In general, assuming n s symptoms (inputs), each 
with n a fuzzy attributes being defined on the input domain, will lead to « pr emise = n'a 
possible premise terms. Denoting the number of faults (or output membership func¬ 
tions in the case of regression problems) with n /, gives a total of n m i es = n / « pre mise 
possible rules. Even if the rules are limited to n p premise terms, one has to consider 

Ve-e (17.17) 

possible rule premises. For the relatively moderate case of 5 fuzzy attributes, 10 symp¬ 
toms and 3 faults this yields about 30 million possible rules. 

A typical example of the SBE approaches is realized in the learning of the 
SARAH network. [17.3] reports the following training scheme: A matrix R is created 
with « pr emise rows an d n f columns. This means that all possible rules are represented 
by an entry in the matrix R. All rules are possible. The matrix elements are initialized 
with zero. The training requires that input membership functions exist. The algorithm 
now iterates through the training data. For each data point, the membership values 
of all fuzzy attributes and from them the rule premise activations are computed. The 
strongest activation wins. In a similar way, the strongest output attribute is deter¬ 
mined and the corresponding entry in R is then increased by a certain value. The 
procedure repeats until no data points are left. It is referred to as Hebbian Learning. 

A nearly identical method was suggested by [17.32] for the rule structure de¬ 
termination of the neuro-fuzzy system FALCON from Figure 17.11. Instead of ex¬ 
plicitly forming a matrix containing the association strengths, it is indicated to act 
directly on the connection weights of the neural structure. 

The underlying idea of the mentioned and also many other learning approaches 
is to find concurrent activations in the membership domain occurring in parallel to 
the given output activation. The rules or weights reflecting that input/output behav¬ 
ior are then strengthened or created in the first place. Competing rules that would 
contradict can be reduced. A threshold of some kind finally decides which rules are 
kept and which discarded. Since the approach is based on the association strength 
of the activations of the input and output patterns, it will be referred to as activated 
learning. 

b) Sequential forward selection (SFS) 

In contrast to the methods utilizing SBE there are the algorithms that successively 
build up the rule base. The Sequential Forward Selection methods start with an empty 
rule base and iteratively augment new rules to model the training data. The main 
distinction between the different SFS algorithms are then the criteria to choose a rule 
and stop the rule growing process. 
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The SFS approaches have to deal with contradictory or redundant rules, just as 
the SBE algorithms. Here, the same checks and procedures that were mentioned in 
the previous section can be applied. 

In general, the rule construction process of the SFS methods can be driven by 
the iteration through all data points or by the minimization of an appropriate overall 
error measure. The advantage of the first procedure in contrast to the SBE methods is 
that the complexity for activated learning scales only with the number of data points. 
This makes small but highly dimensional data sets feasible. 

The NEFCLASS approach for instance processes a new data point only if there 
is no existing rule that describes its input/output relationship. This way, the data 
points contributing new information are found. An activated learning then deter¬ 
mines whether the activation is strong enough to justify a new rule. [ 17.38] suggests 
different learning criteria: “Simple” rule learning utilizes the data with the best clas¬ 
sification performance, while “best per class” proceeds with one best point per class. 

c) Identification of rules concurrent with membership functions 

Structurally different from the methods described above are the approaches that cre¬ 
ate the rule structure and membership functions at the same time. They are usually 
based on an overall error measure. It is normally applied with a least-squares objec¬ 
tive function. Using all training data points, the overall objective is determined. As 
long as a further reduction of it is desired, more rules and membership functions are 
created to better fit the training data. This usually means some sort of segmentation 
of the input space or placing of locally valid basis functions which will continue until 
the necessary granularity of data description is reached. 

This concept is for instance realized in the Neural Gas approach by [17.17]. This 
method determines the centers of radial basis function networks by placing new basis 
functions at locations where the error is the highest. A similar objective is followed 
be the LOLIMOT-learning that is based on a recursive splitting of the input domain, 
thereby concurrently creating the fuzzy input segmentation. 

While the LOLIMOT approach and other RBF networks are difficult interpreted 
linguistically, other approaches use the error-based method more directly to create 
fuzzy systems: [17.1] presented a hyperbox approach for fuzzy classification. The 
approach is simple: The data of the two different classes is fitted each into a sep¬ 
arate hyperbox. If two boxes overlap, the overlap is removed by creating smaller 
boxes within the overlapping region. This continues until any overlap is removed. A 
projection of the boxes creates the fuzzy attributes. 

17.3.4 Optimization methods 

Due to the equivalence of neuro-fuzzy systems with standard neural networks, they 
can be optimized with the identical methods that are employed for normal neural 
networks. Depending on the structure of the neuro-fuzzy systems, common linear 
and nonlinear optimization methods are used. 
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Since almost all authors report the use of least-squares objective functions, the 
optimization becomes especially simple for those parameters that influence the ob¬ 
jective linearly. This usually applies for the neuro-fuzzy systems with a radial basis 
function equivalent structure such as the LOLIMOT network utilizing singleton or 
linear function output layers. 

Most structures, however, change parameters of the first and second layers cor¬ 
responding to hidden relevance weights or membership function parameters. The 
FALCON network for instance, establishes a backpropagation learning of the mem¬ 
bership layer weights. A standard least-squares objective function is built from the 
network output vector f and the desired target vector t for each of the k data points. 

[ 17.32] state that the parameter optimization is performed relatively easily com¬ 
pared to classical neural networks because the clustering approach for finding initial 
membership functions guarantees a good starting point for the parameters. 

The NEFCLASS approach also tunes parameters of the membership functions 
only. A difference to the approach above is the selective optimization. Instead of 
a parallel optimization of all MSF parameters, only a few membership functions 
are adjusted in each optimization step: A single training pattern is presented to the 
network and its outputs computed. Each rule unit with an output error > 0 will then 
propagate its error back to the rule neuron. Since the conjunction operator in the rule 
unit is a minimum, one can easily identify the membership function activation that 
dominates the others. It will be the one with the smallest activation level. Only the 
parameters of that membership function will be adjusted to yield a higher or smaller 
activation for the present data sample depending on the sign of the error. 

The SARAH network differs from the ones above as all parameters of it are sub¬ 
ject to an optimization. [17.3] reports the use of an unconstrained backpropagation 
algorithm. 

The ANFIS as well as the LOLIMOT structure are equipped with parameters that 
influence the output error linearly. The weights of the linear function in the conse¬ 
quence layer (17.16) can be optimized in a single-step pseudo-inverse optimization. 
The antecedent parameters of the ANFIS network are subsequently optimized with 
a nonlinear optimization (gradient descent). The LOLIMOT algorithm differs from 
that as the antecedent layer parameters determining the positions of the MSF are 
found with a simple, iterative splitting procedure. 

More details on this survey are given in [17.19], There also conditions for the 
transparency, i.e, useful rule examination and interpretation are stated, like selection 
of membership functions, local behavior, weight constraints and a small rule basis. 

17.3.5 Self-learning classification tree (SELECT) 
a) Select structure 

The SELECT approach according to [17.20], [17.19] combines ideas from the field 
of decision trees with the adaptation properties of neural networks and the interpreta¬ 
tion of fuzzy logic. Its basic element is a simple neuro-fuzzy structure that consists of 
a fuzzification layer similar to those discussed in the previous sections and a neural 
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AND-operator that is implemented in a straightforward manner as an approximated 
bounded-sum operator. From the discussion of the previous section, a learning pro¬ 
cedure with a sequential forward selection (SFS) strategy seemed advisable for the 
application of fault diagnosis. Also, a constrained optimization is implemented to 
retain the desired transparency. 

Figure 17.12 shows this basic element composed of the operator and the fuzzy 
attribute layer. The pictured operator is equivalent to the ANDOR-operator from 
Figure 23.4 in Chapter 23.3 with a = 1. Due to the sigmoidal approximation of the 
threshold, this results in a soft version of the conjunction. The output can assume 
values close to one even if not all inputs are equal to one. 
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Fig. 17.12. Basic processing element (rule R /) and OR-operation for output class Fj of the 
SELECT-structure. From the left, the inputs $,• are fed into the fuzzification layer. This yields 
membership values /x'" that describe how strongly the symptom si belongs to the fuzzy attribute 
A™. Multiple membership values are then fed into the AND-operator by computing a weighted 
sum. The activation function is then applied to this sum. Finally, the outputs of the AND- 
operators that describe a rule with the same rule conclusion, are combined in the OR-operator 
(“accumulation”). The output fj of the SELECT-system is a measure for the corresponding fault 
class Fj. 


The connections from the fuzzy attribute m to the operator neuron / are equipped 
with weights w m / that in combination with the operator can be interpreted as rele¬ 
vance weights. This yields an output activation given by 


[ 2 ] 

0 = 


1 _|_ i wWA™-0.75 n p ) 


(17.18) 


Here, n p is the number of inputs into the operator. X controls the slope of the func¬ 
tion, thereby determining the “degree of fuzziness” of the operator. A large X brings 
the operator closer to the binary equivalent. The threshold of the sigmoidal function 
at 0.75n p was empirically chosen to yield a fuzzy version of the AND-operation. The 
symbol for the membership value is an abridged notation. In fact, the chosen fuzzy 
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attribute depends not only on .sy and m but rather on the individual rule premise. Since 
the fuzzy attribute can be one out of the n a (s ,) attributes defined for the symptom .sy, 
a more precise notation would be: 

n7, = < m) (17-19) 

where k is from [I ; n a (si)\. For simplicity reasons, however, the abridged version is 
kept for the remainder of this chapter. 

The specialty of the SELECT-approach is the tree structure composed of AND- 
operators. This creates an equivalent representation as a tree composed out of fuzzy 
rules. The nodes of the tree consist of AND-operators while the leaves contain a 
consequence of the rule, see Figure 17.13. The output of the operator determines 
[ 2 ] 

the rule fulfillment Oj of the consequence. Multiple leaves can contain an identical 
consequence. The degrees of fulfillment of the rules are then combined by an appro¬ 
priate t-conorm (OR-operation) such as the maximum yielding the output fj of the 
SELECT structure. 



Fig. 17.13. Tree representation of a SELECT system. In this simple tree there are no 0R- 
operators necessary, because there are no two rules with the same conclusion 


The processing of the tree is performed in a sequential manner: After evaluating 

r 2 l 

the top node, a first neuron output of is determined. The next node is then multiplied 
with (1 — 0 ; ), so that its activation is given by = (1 — o\ ')of J , where o, J is the 
output from (17.18). This multiplication is continued so that the sum of all activations 
[21 

o f will not exceed 1. 

For large trees that tend to create small outputs because the sigmoidal function 
'[21 

yields an output of > 0 even during small activations, the multiplication is omit¬ 
ted. Otherwise, the multiplication would lead to an ever decreasing output of the 
subsequent neurons. 

The steps for the determination of a SELECT system from data and prior knowl¬ 
edge consist of the design of membership functions, followed by a structure learn¬ 
ing based on prior knowledge and finally a neural network-type optimization. These 
steps will be explained below. 
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b) Learning algorithm 

The learning approach is based on the observation that a number of faults must be 
distinguished by relatively many symptoms that span a sparsely populated symptom 
space due to few experimental data. This makes sequential backward elimination 
approaches unsuitable. Another observation is that not all symptoms are necessary to 
isolate every fault situation. Instead, some faults are easily identified with only one 
or two significant symptom reactions. This favors systems that utilize an adaptive 
complexity depending on the difficulty of the decision. 

The following learning approach has been inspired by decision tree learning al¬ 
gorithms. As such, it is relatively unique for the area of fault diagnosis and neuro- 
fuzzy systems. The procedure is iterative and relies on the reduction of the training 
data set. Figure 17.14 shows a flowchart of it. The main steps membership function 
design, candidate rule selection, and rule evaluation are explained in the following. 

Design of Membership Functions 

The design of membership functions can be based on different principles. Generally, 
any of the methods described in Section 17.3.2 is appropriate. A manual construction 
is advantageous if prior knowledge is present. 
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Fig. 17.14. Iterative learning algorithm of SELECT. 


For the SELECT approach, a fuzzy clustering with the Fuzzy-C-Means (FCM) 
algorithm based on a one-dimensional projection of the training data proofed to work 
sufficiently well. Due to the usually nonuniform distribution of measured symptoms, 
however, a modified version, the Degressive FCM algorithm was developed. It is ex¬ 
plained in [17.19]. Its main feature is to include a nonlinear transformation of the 
cluster membership to accommodate uneven data distributions. The result is a set 
of clusters that is usually finer around zero (symptom is not reacting) and broader 
towards the limits of the symptoms. After determining the cluster centers and as¬ 
signments of the data points to them, adjacent and highly overlapping clusters were 
combined. A typical set of fuzzy attributes realized by the membership functions can 
be seen in Figure 17.15. 

The functional form of the fuzzy attributes was chosen to be the product of two 
sigmoidals. This is advantageous to realize nonsymmetric functions and is simple to 
implement. The function is: 


(s t ) 
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(17.20) 
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Cluster 1 Cluster 2 Cluster 3 


Fig. 17.15. Example of symptom membership functions from clustering. The three cluster 
centers are noted in the picture. 

The four parameters determine the exact shape and loca¬ 

tion of the membership functions. The attributes at the limits of the data range were 
chosen to be single sigmoidals. 

The approach to derive the parameters of (17.20) from the FCM clustering is as 
follows: After the cluster centers are found, the data points are uniquely assigned to 
the one cluster with the center v m to that they have the highest fuzzy membership 
degree assigned to by the FCM method (usually the one that is the closest). From 
all N m points that were assigned to one cluster v m , the standard deviation of these 
points is estimated: 

1 My; 

&m = — . ^ ' CSj V m ) (17.21) 

Nm ~i= 1 

Combined with the cluster centers, the parameters of the membership functions are 
chosen to fulfill two requirements: 

• the membership must be at least (1 — s ) at the cluster center, (e.g. s = 0.03); 

• the intersection of two membership functions is at the Bayes-limit, i.e. at the 
point that would be ideal to differentiate data from the two clusters if Gaussian 
data distributions are assumed. This point is defined by two neighbored Gaussian 
distribution probability densities having the same value 0.5. 

This way, the attributes are already good separators for the data if the clusters cover 
points from different fault classes. An additional normalization of the functions is 
not necessary. 

Selection of Candidate Rules 

Candidate rules are rules that promise good help in distinguishing the data from 
different fault classes. They are found in a systematic way and evaluated for perfor¬ 
mance before they build the SELECT tree structure elements from Figure 17.12. 

The basic problem of the selection of candidate rules is this: Given is a set S t of 
training data points consisting of data from the faults Fj, j = 1.. .n r. Which fuzzy 
rule lZ a can be found that is activated by a subset S a such that: 

1) If lZ a is evaluated with all training data S r , the set S a should yield a high rule 
fulfillment, while the rest, ‘ S, \S a } should create an activation as small as possi¬ 
ble. 
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2) The data from S a should be as pure as possible. This means, all elements should 
belong to only one fault Fj. Typically, S a should comprise all data from one 
particular fault. 

3) The set S a should be as large as possible. 

It is clear that some of the requirements contradict. The two extremes are: S a is only 
one data point. This will be ideal for the second requirement but bad for the third. If 
S a = St on the other hand, the third is optimally, but usually the second requirement 
not fulfilled. 

A sensible way to arrive at a useful rule lZ a is to successively tighten its condi¬ 
tion, i.e. the rule premise of lZ a , and check if the requirements 1-3 are fulfilled. This 
is done by taking more and more premise terms into account. This procedure makes 
lZ a only as complex as necessary. 

The selection of a candidate rule lZ a is based on the training data set. With the 
membership functions, each data sample is assigned to the fuzzy attribute A'f that 
yields the highest membership function value /x'”. That way, the training set is trans¬ 
lated into a training set Sf t with fuzzy attributes instead of observed symptom quan¬ 
tities. The procedure continues as follows: 

1) For each fault Fj, a candidate rule 7 Zj is built. These rules start with a premise 
containing a single term like “IF .V 3 large”. These terms are selected as follows: 
For each symptom Sj , the most frequent entry in Sj t belonging to fault class Fj 
is found. The one symptom where the same term appears most often in Sf t is 
then chosen to build the starting rule for fault Fj. 

That way, the number of rules equals the number of faults Fj ; 

2) All rules 7 Zj are evaluated with the method explained below. If a satisfactory 
rule 7Z* is found, it will be kept. If none of the rules performs good enough, 
their premise will be extended by an additional input dimension, i.e. by an ad¬ 
ditional term like: “AND s 5 small”. Again, the performance is evaluated and, if 
necessary, the rule premises augmented up until a specified complexity n p ma x 
is reached; 

3) The chosen rule TV forms a new neuron following Figure 17.12; 

4) The correctly classified data set S c j is removed from the training set. This means 
that the training data set decreases in size, which in turn makes the rule extraction 
for the subsequent neurons simpler; 

5) The procedure then repeats with the smaller training set. 

The learning is finished as soon as the training set is empty, no useful rule can 
be found, or a predefined complexity is reached. The learning will lead to a tree-like 
structure of rules that are evaluated sequentially. 

Evaluation of Candidate Rules 

For the evaluation of TZ a with the conclusion Fj w.r.t. the requirements above, a 
threshold 8 with S a = {s € St | c>[ 2 ^(s) > 5} is defined. This will lead to four subsets 
of the complete training data set S t : 
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(17.22) 


a(s) < 8, F(s) ^ Fj} 


It becomes evident that S c j denotes the data correctly classified with rule lZ a as 
belonging to fault Fj and S co the data correctly classified as not belonging to Fj. 
S m j and S mo are the sets of misclassified data. The definition of a classified data 
point is an output o^ greater than 8 . The cardinalities, i.e. number of the sets are 
denoted as N C j, N m j, N co , N mo . Additionally, Nj = N C j + N m j and N 0 = N co + 
N mo . The total number of training data point is given by A = N 0 + Nj. 

The classification performance of !Z a can now be evaluated following the ideas 
of decision trees based on a measure of entropy. They can be used to evaluate the 
impurity of the data sets after applying the rule lZ a compared to the impurity of the 
data set before applying the rule. Ideally, lZ a should distinguish the data such that all 
data points from Fj are separated from the rest of the data. Ideally, that would yield 
pure data sets and a minimal entropy. 

Instead of calculating the logarithms of the entropy, however, a simpler strategy 
was used: [17.55] suggests to work with the cardinalities N c j, N m j, N co , N mo . They 
state that a maximization of the following entropy 


Ssun = -rf 



-tT (17.23) 


is sufficient for all practical purposes. In the SELECT approach, however, (17.23) 
will not work, since it is symmetric, i.e, it will also be maximized if the rule lZ a is 
fulfilled by all but data from Fj. Therefore, a modified measure was created: 


(17.24) 



It possesses the desired properties: It is maximal for a perfect separation and penal¬ 
izes wrong rule fulfillment. 

Besides S, also a simpler measure of the rule performance was heuristically 

[21 

found to be useful. It measures the difference Aer- of the mean rule fulfillment 
between data from class Fj and all other data: 
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where k denotes a running index over the training data set. 

The following example will illustrate the rule selection and evaluation approach. 


Learning example for SELECT 

Figure 17.16 shows a simple two-dimensional problem with three fault classes 
Fj. For each dimension, two fuzzy attributes have been found. They are also shown 
in Figure 17.16. The three first candidate rules !Z a are: 
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Ki: IF s 2 small THEN fault F\ 

1Z 2 : IF si large THEN fault F 2 (17.26) 

7^3 : IF si small THEN fault F 3 

It should be emphasized that this set of rules does not suffice to solve the problem. 

It is only a selection of possibilities from which a first rule can be taken. The second 
rule 77-2 performs best and is kept as TV . The procedure removes all data belonging 
to class F 2 from the training because it is correctly described by 1Z*. Again, new 
rules are created: 

TZi : IF s 2 small THEN fault F x 

1Z 2 : IF ,y 2 large THEN fault F 3 (17.27) 

It should be noted that a rule for class 2 does not need to be designed any more. 
Considering the data, both rules perform equally well. The first is retained as TZ *, 
because there are more data samples from class F\. Since the remaining data belongs 
only to F 2 , there is no need to divide the data using an additional rule. The final 
SELECT rules are: 


IF ,y, large THEN fault F 2 

ELSE IF s 2 small THEN fault F x (17.28) 

ELSE fault F 3 

It should be noted that the use of the keyword for the alternative, "ELSE”, shows up 
in the rules. 
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Fig. 17.16. Simple three-class problem. The membership functions from the clustering are 
shown on the right. 


The example shows a characteristic of SELECT structures: The most informative 
premises are selected first. Furthermore, the system tends to create a smaller com¬ 
plexity than a standard parallel rule base would. The reason is that the sequential 
structure essentially “hides” complexity in the "ELSE” statement. 
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For comparison, the complete parallel rule set for the problem would be: 

7 Z P X : IF (,Vi small) and (.s '2 small) THEN fault F\ 

Hi : IF (.v i small) and (s 2 large) THEN fault F 3 

7Zf : IF (.v, large) and (s 2 small) THEN fault F 2 

1: IF (.s' i large) and (,s '2 large) THEN fault F 2 


The last step of the SELECT learning approach is an optimization of the weights 
Wij . This is accompanied by a pruning process and will be shown in the next section. 


c) Optimization of relevance weights 


The rule extraction phase is followed by an optimization of the free parameters to 
increase the performance of the system. The optimization allows a fine-tuning to 
better adapt to the data distribution. This is advantageous because the rules are up to 
now based on the fuzzy membership functions from the one-dimensional clustering. 
By a fine-tuning of the relevance weights, one can influence the region that is selected 
with a rule to a certain extend. This optimization will be performed on the weights 
of multiple inputs simultaneously. In contrast to the one-dimensional clustering, it 
therefore can benefit from information concering the correlation of the symptoms. 

The procedure follows the neural network training approaches. The objective is 
to minimize a least-squares function 

N 

E = ^2 (f(&) — t(A :)) 2 —» min. (17.30) 

k= 1 


where N is the number of training samples and t (k) denotes the target data, f are the 
fault measures. A 0/1 scheme is used for t. Subject to the optimization are the rel¬ 
evance weights w m /. The optimization is performed for each element individually: 
Instead of a parallel optimization of all parameters, only the weights that are con¬ 
nected to one AND-neuron are optimized. The data base used for this optimization 
is not always the complete training set. It rather follows the learning approach: In the 
same way as during learning, the training set is reduced from one rule to the next. 
Data samples that are correctly identified will not be included. 

In addition to this data reduction, the data samples that lead to a strong rule 
fulfillment must be identified. This is done by evaluating the output oj of the AND- 
operator if the data belonging to its conclusion is presented. This subset of the train¬ 
ing data and the data belonging to other classes are included into the optimization. 
The optimization drives the output 


of ] (si(k)) = 


1 

1 _|_ g -A.(E "Li wW/4"-(0.75/!p)) 


(17.31) 


of the rules with n p inputs towards 0 or 1. 

In order to ensure the interpretation of the relevance weights, the following con¬ 
straints were introduced: 
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0 < w m i < 1 y; w m i = n p (17.32) 

m= 1 

For the optimization, the standard procedure in neural networks is to employ a back- 

hi 

propagation algorithm. The partial derivative of o) J (.v, (/c)) can favorably be calcu¬ 
lated with the known derivative of the sigmoidal function 

f(x) = sig(x) = 1 (17.33) 

l + e x 

The further procedure applying the softmax function, [ 17.7], is described in [17.19]. 

An important property of the optimization algorithm is the aforementioned se¬ 
quential optimization with decreasing training set size. Instead of one massively par¬ 
allel optimization as common for neural networks, here a series of smaller optimiza¬ 
tions is performed. This is of great computational advantage. A typical optimization 
might include 10 individual optimization runs with 4 parameters instead of a parallel 
optimization with 40 free parameters. If the SELECT system consists of n r oper¬ 
ators with each having n p inputs, it will create a computational need of the order 
n r ■ 0{n 2 p ) instead of 0((n r n p ) 2 ). This lets a typical optimization of the structure be 
performed in seconds to minutes on a standard Personal Computer and also reduces 
the problems associated with a high-dimensional but sparsely populated parameter 
space (known as curse of dimensionality). 

Even more efficient and faster for large data sets than the nonlinear optimization 
with the softmax-transformation is an optimization based on a linearization of the 
sigmoidal function of the AND-neuron. The idea is to work with an error description 
that is linear with regard to the relevance weights. This enables the use of fast opti¬ 
mization algorithms. The idea of the approach is visualized in Figure 17.17. It shows 
[21 

the output 0 / as a function of the weighted sum of the inputs, here abbreviated with 
p. By using this approximation of the operator function, the problem is posed with an 
objective function that the weights w m / enter linearly. The constraint that the weights 
add up to unity can also be included. The optimum is then found by a one-step pro¬ 
cedure. The derivation of the method can be found in [17.19]. An optimization with 
the linearization showed results comparable to the nonlinear optimization. 

The described SELECT method offers some possibilities to incorporate prior 
knowledge : 

1) Manual placement of membership functions. 

2) Manual augmentation of the rule structure with rules at the top. The weights of 
these manually added rules can be improved by the optimization. 

3) Structural information about similar fault situations can help building a more 
hierarchical structure that is easier to interpret physically. 

The manual introduction of fuzzy rules is especially useful if the fault-symptom 
relationships of some fault situations are known or can be derived from physical un¬ 
derstanding of the process under consideration. This also enables to manually place 
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Fig. 17.17. Linearization of the ANDOR-operator. The region of the sigmoidal function with the 
transition from close to 0 to the region close to 1 is replaced by a straight line. This corresponds 
to the original AND-operator, see Chapter 23.3. The dashed lines indicate the region where the 
sigmoidal function is replaced by a straight line. 


fuzzy rules in the tree, see [17.19]. Applications of this SELECT self-learning diag¬ 
nosis methods are shown in Sections 20.1.5 and in [17.19] and [17.62]. The required 
computation time is relatively small, i.e. in the range of seconds to minutes. 


17.4 Problems 

1) Establish a fault tree of an illumination system, consisting of a power supply, 
cable, plug, fuse, switch and two lamps in parallel connection. The faults are 
defects of the components. Symptoms are 1, 2 or no lamp burning. Design a 
fault-diagnosis systems. 

2) State the differences between fuzzy-logic and probabilistic reasoning. 

3) In which cases is forward chaining as well as backward chaining feasible for 
fault diagnosis? 

4) Design a fuzzy logic diagnosis system for a dust cleaner, consisting of a power 
switch, a universal motor with fan, a dust container, a vacuum pressure sensor, a 
flexible tube and a nozzle. Observable outputs are the noise, pressure, operation 
time since replacing the dust container, and suction power. 

5) A DC motor driven drill machine shows irregular speed. Establish a fault- 
symptom tree and a fuzzy-logic diagnosis system if the observations are speed 
for idling and for load, brush fire and motion of the cable. 






Part IV 


Fault-Tolerant Systems 
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Fault-tolerant design 


The improvement of reliability can be increased by two different approaches, per¬ 
fectness or tolerance, [18.4]. Perfectness refers to the idea of avoiding faults and 
failures by means of an improved mechanical or electrical design. This includes the 
continued technical advancement of all components that increase the service life. 
During operation the intactness of the component must be maintained by regular 
maintenance and replacement of wearing parts. Methods that facilitate fault detec¬ 
tion at an early stage allow for replacing the regular maintenance schedule with a 
maintenance-on-demand scheme, as discussed in Chapter 2 and 3. 

Tolerance describes the notion of trying to contain the consequences of faults 
and failures thus that the components remain functional. This can be reached by the 
principle of fault-tolerance, see also Section 2.4. Herewith, faults are compensated in 
such a way that they do not lead to system failures. The most obvious way to reach 
this goal is redundancy in components, units or subsystems. However, the overall 
systems then become more complex and costly. In the following, various types of 
fault-tolerant methods are reviewed briefly, see [18.3]. 

Fault-tolerance methods generally use redundancy. This means that in addition 
to the considered module, one or more modules are connected, usually in parallel. 
These redundant modules are either identical or diverse. Such redundant schemes 
can be designed for hardware, software, information processing, and mechanical and 
electrical components like sensors, actuators, microcomputers, buses, power sup¬ 
plies, etc. 


18.1 Basic redundant structures 

There exist mainly two basic approaches for fault-tolerance, static redundancy and 
dynamic redundancy. The corresponding configurations are first considered for elec¬ 
tronic hardware and then for other components. Figure 18.1a shows a scheme for 
static redundancy. It uses three or more parallel modules that have the same input 
signal and are all active. Their outputs are connected to a voter, who compares these 
signals and decides by majority which signal value is the correct one. If a triple 
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modular-redundant system is applied, and the fault in one of the modules generates a 
wrong output, this faulty module is masked (i.e. not taken into account) by the two- 
out-of-three voting. Hence, a single faulty module is tolerated without any effort for 
specific fault detection, n redundant modules can tolerate (n — l)/2 faults (n odd). 

To improve the fault tolerance also the voter can be made redundant, [18.8], Dis¬ 
advantages of static redundancy are high costs, more power consumption and weight. 
Furtheron, it cannot tolerate common-mode faults, which appear in all modules be¬ 
cause of common fault sources. 

Dynamic redundancy needs less modules at the cost of more information process¬ 
ing. A minimal configuration consists of two modules. Figure 18.1b and c. One mod¬ 
ule is usually in operation and, if it fails, the standby or back-up unit takes over. 
This requires fault detection to observe if the operation modules become faulty. Sim¬ 
ple fault-detection methods only use the output signal for, e.g. consistency check¬ 
ing (range of the signal), comparison with redundant modules or use of information 
redundancy in computers like parity checking or watchdog timers. After fault de¬ 
tection, it is the task of the reconfiguration to switch to the standby module and to 
remove the faulty one. 



Fig. 18.1. Fault-tolerant schemes for electronic hardware: (a) static redundancy: multiple- 
redundant modules with majority voting and fault masking, m out of n systems (all modules 
are active);(b) dynamic redundancy: standby module that is continuously active, “hot standby"; 
(c) dynamic redundancy: standby module that is inactive, “cold standby” 


In the arrangement of Figure 18.1b, the standby module is continuously operat¬ 
ing, called “hot standby”. Then, the transfer time is small at the cost of operational 
aging (wear-out) of the standby module. 
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Dynamic redundancy, where the standby system is out of function and does not 
wear, is shown in Figure 18.1c, called “coW standby”. This arrangement needs two 
more switches at the input and more transfer time due to a start-up procedure. For 
both schemes, the performance of the fault detection is essential. 

Dynamic redundancy can be extended to two and more standby modules, thus 
tolerating two and more faults. Combinations of static and dynamic redundancy lead 
to hybrid redundant schemes to avoid the disadvantages of both ones on cost of 
higher complexity, [18.8]. 

Similar redundant schemes as for electronic hardware exist for software fault- 
tolerance, i.e. tolerance against mistakes in coding or errors of calculations, compare 
Table 18.1. The simplest form of static redundancy is repeated running (n > 3) of 
the same software and majority voting for the result. However, this only helps for 
some transient faults. As software faults in general are systematic and not random, 
a duplication of the same software does not help. Therefore, the redundancy must 
include diversity of software, like other programming teams, other languages, or 
other compilers. With n > 3 diverse programs, a multiple-redundant system can be 
established followed by majority voting as in Figure 18.1a. However, if only one 
processor is used, calculation time is increased and using n processors may be too 
costly. 

Dynamic redundancy by using standby software with diverse programs can be 
realized by using recovering blocks. This means that in addition to the main software 
module, other diverse software modules exist, [18.8], [ 18.5]. 

For digital computers (microcomputers) with only a requirement for fail-safe 
behavior, a duplex configuration like Figure 18.2 can be applied. The output signals 
of two synchronized processors are compared in two comparators (software) which 
act on two switches of one of the outputs. In case of disagreement of the output 
signals one of the switches disconnects the output from the following components. 
This scheme covers both, hardware and software faults but is not fault tolerant. It is 
useful if the miss of the output brings the system in safe-state. (This fail-safe system 
is, e.g. used for ABS braking systems). 

Fault-tolerance can also be designed for purely mechanical and electrical sys¬ 
tems. Static redundancy is very often used in all kinds of homogeneous and inhomo¬ 
geneous materials (e.g. metals and fibers) and in special mechanical constructions 
like lattice-structures, spoke-wheels, dual tires or in electrical components with mul¬ 
tiple wiring, multiple coil windings, multiple brushes for DC motors and multiple 
contacts for potentiometers. This quite natural built-in fault-tolerance is generally 
characterized by a parallel configuration like in Figure 18.3a. However, the inputs 
and outputs are not signals but, e.g. forces, electrical currents or energy flows, and a 
voter does not exist. All elements operate in parallel and if one element fails (e.g. by 
breakage) the others take over a higher force or current, following the physical laws 
of compatibility or continuity. Hence, this is a kind of “stressful degradation”. 

Examples are two gears, two belts, two chains, two valves, two hydraulic cylin¬ 
ders, two power supplies or two electrical motors with each half load in normal oper¬ 
ation. Further examples are the tandem-piston-system for hydraulic brakes or double 
magnet ignition for aircraft engines. Fault tolerance by redundant kinematics was 
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proposed by [18.9]. Mechanical and electrical systems with dynamic redundancy as 
depicted in Figure 18. lb, c can also be built. Hot standby, where the standby unit con¬ 
tinuously operates in idle running is usually not used because of unnecessary wear 
or ageing and power consumption. It makes only sense if the interruption during the 
transfer has to be avoided. 



microprocessor 2 

Fig. 18.2. Duplex microcomputer system with dynamic redundancy for fail-safe behavior 


Therefore, mostly cold standby is meaningful, Figure 18.3b. Fault detection of 
the operating unit may be based on measured outputs like position, speed, force, 
torque, pressure, flow, voltage or current. However, then only large failures like com¬ 
plete break down can be detected. Examples are standby feedwater pumps for steam 
boilers or backup power generators. To improve fault detection also the input signals 
and other intermediate signals should be available. As this is rarely the case for pure 
mechanical components like linkages, gears, or drive chains or pure electrical com¬ 
ponents like amplifiers, cables, transformators dynamic redundancy can mainly be 
applied for electro-mechanical systems like speed or position controlled electromo¬ 
tor units, electro-hydraulic systems or electro-mechanical actuators. 

Fault tolerance with dynamic redundancy and cold standby is especially attrac¬ 
tive for mechatronic systems where more measured signals and embedded computers 
are already available and therefore fault detection can be improved considerably by 
applying process model-based approaches. Table 18.1 summarizes the appropriate 
fault-tolerance methods for different systems. 


18.2 Degradation steps 

Mainly because of costs, space and weight, a suitable compromise between the de¬ 
gree of fault tolerance and the number of redundant components has to be found. 
In contrast to fly-by-wire systems, for industrial, mobile and mechatronic systems, 
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Fig. 18.3. Fault-tolerant schemes for mechanical and electrical systems: (a) Static redundancy 
for mechanical and electrical components: multiple redundant elements; (b) Dynamic redun¬ 
dancy for electro-mechanical and mechatronic systems: standby module which is inactive, “cold 
standby”, xj^f: measured input, output and intermediate signals 


only one single or two failures can be tolerated for hazardous cases, mainly because 
a safe state can be reached easier and faster. This means that not all components need 
very stringent fault-tolerance requirements. The following steps of degradation are 
distinguished: 

• fail-operational (FO): one failure is tolerated, i.e. the component stays opera¬ 
tional after one failure. This is required if no safe state exists immediately after 
the component fails; 

• fail-safe (FS): after one (or several) failure(s), the component directly possesses 
a safe state (passive fail-safe, without external power) or is brought to a safe state 
by a special action (active fail-safe, with external power); 

• fail-silent (FSIL): after one (or several) failure(s), the component is quiet exter¬ 
nally, i.e. stays passive by switching off and therefore does not influence other 
components in a wrong way. 

For vehicles, it is proposed to subdivide FO into “long time” and “short time”. 
Considering these degradation steps for various components, one has to check first 
if a safe state exists. For automobiles, (usually) a safe state is stand still (or low 
speed) at a nonhazardous place. For components of automobiles, a fail-safe status 
is (usually) a mechanical back-up (i.e. a mechanical or hydraulic linkage) for di¬ 
rect manipulation by the driver. Passive fail-safe is then reached, e.g. after failure of 
electronics if the vehicle comes to a stop independently of the electronics, e.g. by 
a closing spring in the throttle or by actions of the driver via mechanical backup. 
However, if no mechanical back-up exists after failure of electronics, only an action 
by other electronics (switch to a still operating module) can bring the vehicle (in 
motion) to a safe state, i.e. to reach a stop through active fail-safe. This requires the 
availability of electric power. 
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Generally, a graceful degradation is envisaged, where less critical functions are 
dropped to maintain the more critical functions available, using priorities, [18.2], Ta¬ 
ble 18.2 shows degradation steps to fail-operational for different redundant structures 
of electronic hardware. As the fail-safe status depends on the controlled system and 
the kind of components, it is not considered here. 


Table 18.2. Fail behavior of electronic hardware for different redundant structures. FO: fail- 
operational; F: fail; FS: fail-safe not considered 
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For flight-control computers, usually a triplex structure with dynamic redundancy 
(hot standby) is used, which leads to FO-FO-FS, such that two failures are tolerated 
and a third one allows the pilot to operate manually, [18.7], [18.1 ], [18.6]. If the fault 
tolerance has to cover only one fault to stay fail-operational (FO-F), a triplex system 
with static redundancy or a duplex system with dynamic redundancy is appropriate. If 
fail-safe can be reached after one failure (FS), a duplex system with two comparators 
is sufficient. However, if one fault has to be tolerated to continue fail-operational and 
after a next fault it is possible to switch to a fail-safe (FO-FS), either a triplex system 
with static redundancy or a duo-duplex system, see Figure 18.4, may be used. The 
duo-duplex system has the advantages of simpler failure detection and modularity. 

Figure 18.5 shows the improvement of the reliability for some of the discussed 
fault-tolerant structures with dynamic redundancy. For example, if a single module 
with failure rate X = 2-10 ~ 4 h~ l (e.g. microcomputer) is used with MTTF = 5-10 3 /? 
the triplex or duo-duplex systems improve the failure rate to X tot ~ 2 • 10 7 /;~' or 
MTTF ror = 5 • 10 6 //. Hence, the failure rate is improved by a factor 1000. 


18.3 Problems 

1) How many modules are required for fail-safe behavior with static and with dy¬ 
namic redundancy? 
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Fig. 18.4. Duo-duplex system with static redundancy 



of one module 

Fig. 18.5. Improvement of the reliability for some fault-toierant structures, [18.7]. X-. failure rate 
[h -1 ]; MTTF = | mean time to failure [h]; Example: one component: X = 2- 10 _4 [h _1 ] -»• 
MTTF = 5 - 10 3 h; three components as triplex or duo-duplex: X = 2 ■ 10 —7 [h~ 1 ] -»• MTTF = 
5•10 6 h 


2) Design fault-tolerant schemes for FO-FO-F behavior with static and dynamic 
redundancy. 

3) How is the reliability improved by a 2-out-of-3 static redundant system if the 
modules and the voter have a failure rate of X = 10 -4 [h -1 ]? Calculate also the 
MTTF. 

4) How is the reliability improved by a hot-standby duplex system, if the modules 
and the switch have a failure rate of X = 10 —4 [h 1 ] and the fault-detection 
system X = 10 —3 [h 2 3 4 5 ]? Calculate also the MTTF. 

5) Same problem as in 4) but cold-standby with a failure rate of the inactive module 
as X = 10 _5 [h -1 ]. 
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Fault-tolerant components and control 


High-integrity systems require a comprehensive overall fault-tolerance by fault- 
tolerant components and corresponding control. This means the design of fault- 
tolerant sensors, actuators, process parts, computers, communication (bus systems), 
and control algorithms. Examples of components with multiple redundancy are 
known for aircraft, space and nuclear power systems. However, lower cost compo¬ 
nents with built-in fault tolerance for general applications still have to be developed. 
In the following, some examples are given for sensors and actuators. 


19.1 Fault-tolerant sensors 

A fault-tolerant sensor configuration should be at least fail-operational (FO) for one 
sensor fault. This can be obtained by applying hardware redundancy with the same 
type of sensors or by analytical redundancy with different sensors and process mod¬ 
els. 

19.1.1 Hardware sensor redundancy 

Sensor systems with static redundancy are realized, for example, with a triplex sys¬ 
tem and a voter. Figure 19.1a. A configuration with dynamic redundancy needs at 
least two sensors and fault detection for each sensor, 19.1b. Usually, only hot standby 
is feasible. Another less powerful possibility is plausibility checks for two sensors, 
also by using signal models (e.g. variance) to select the more plausible one. Figure 
19.1c. 

The fault detection can be performed by self-tests, e.g. by applying a known 
measurement value to the sensor. Another way uses self-validating sensors, [19.10], 
[19.6], where the sensor, transducer and a microprocessor form an integrated, decen¬ 
tralized unit with self-diagnostic capability. The self-diagnosis takes place within the 
sensor or transducer and uses several internal measurements, see also [19.19]. The 
output consists of the sensor’s best estimate of the measurement and a validity status, 
like good, suspect, impaired, bad and critical. 
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Fig. 19.1. Fault-tolerant sensors with hardware redundancy: (a) triplex system with static re¬ 
dundancy and hot standby; (b) duplex system with dynamic redundancy, hot standby; (c) duplex 
system with dynamic redundancy, hot standby and plausibility checks 


19.1.2 Analytical sensor redundancy 

As a simple example, a process with one input and one main output y\ and an auxil¬ 
iary output V 2 is considered, see Figure 19.2a. Assuming the process input signal u 
is not available but two output signals y \ and V 2 . which both depend on u, one of the 
signals, e.g. y \ can be reconstructed and used as a redundant signal if process models 
Gm\ and Gmi are known and considerable disturbances do not appear (ideal cases). 

For a process with only one output sensor y\ and one input sensor u, the output 
y i can be reconstructed if the process model Gm\ is known, Figure 19.2b. In both 
cases, the relationship between the signals of the process are used and expressed in 
the form of analytical models. 

To obtain one usable fault-tolerant measurement value yiFT, at least three differ¬ 
ent values for y, e.g. the measured one and two reconstructed ones, must be available. 
This can be obtained by combining the schemes of Figure 19.2a and b as shown in 
Figure 19.3a. A sensor fault v\ is then detected and masked by a majority voter and 
either y\ or V\ u is used as a replacement depending on a further decision. (Also, 
single sensor faults in yi or it are tolerated with this scheme.) 

One example for this combined analytical redundancy is the yaw rate sensor for 
the ESP (electronic stability program) of vehicles, where additionally the steering 
wheel angle as input can be used to reconstruct the yaw rate through a vehicle model 
as in Figure 19.2b, and the lateral acceleration and the wheel speed difference of the 
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right and left wheel (no slip) are used to reconstruct the yaw rate according to Figure 
19.3a. 



(b) 


process 

model 


Fig. 19.2. Sensor fault-tolerance for one output signal y\ (main sensor) through analytical 
redundancy by process models (basic schemes): (a) two measured outputs, no measured input; 
(b) one measured input and one measured output. G,- : Gj(s) transfer functions. 


A more general sensor fault-tolerant system can be designed if two output sensors 
and one input sensor yield measurements of the same quality. Then, by a scheme as 
shown in Figure 19.3, three residuals can be generated and by a decision logic, fault- 
tolerant outputs can be obtained in the case of single faults of any of the three sensors. 
The residuals are generated based on parity equations. In this case, state observers 
can also be used for residual generation, compare, e.g. the dedicated observers by 
[19.5]. (Note that all schemes assume ideal cases. For the realizibility, constraints 
and additional filters have to be considered.) 

If possible, a faulty sensor should be fail-silent, i.e. should be switched off. How¬ 
ever, this needs additional switches that lower the reliability. For both hardware and 
analytical sensor redundancy without fault detection for individual sensors, at least 
three measurements must be available to make one sensor fail-operational. How¬ 
ever, if the sensor (system) has in-built fault detection (integrated self-test or self¬ 
validating), two measurements are enough and a scheme like Figure 19.1b can be 
applied. (This means that by methods of fault detection, one element can be saved). 

Examples of fault-tolerant sensor systems are described in [19.11], 
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models decision logic 


Fig. 19.3. Fault-tolerant sensors with combined analytical redundancy for two measured out¬ 
puts and one measured input through (analytical) process models: (a) y\ is main measurement, 
y 2 , u are auxiliary measurements (combination of Figure 19.2a and b; (b) y j, y 2 and u are 
measurements of same quality (parity equation approach) 


19.2 Fault-tolerant actuators 

Actuators generally consist of different parts: input transformer, actuation converter, 
actuation transformer and actuation element (e.g. a set of DC amplifier, DC motor, 
gear and valve, as shown in Figure 19.4a. The actuation converter converts one form 
of energy (e.g. electrical or pneumatic) into another form (e.g. mechanical or hy¬ 
draulic). Available measurements are frequently the input signal {/;, the manipulated 
variable Uo and an intermediate signal C/ 3 . 

Fault-tolerant actuators can be designed by using multiple complete actuators 
in parallel, either with static redundancy or dynamic redundancy with cold or hot 
standby (Figure 18.1). One example of static redundancy are hydraulic actuators 
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for fly-by-wire aircraft where at least two independent actuators operate with two 
independent hydraulic energy circuits. 

Another possibility is to limit the redundancy to parts of the actuator that have 
the lowest reliability. Figure 19.4b shows a scheme where the actuation converter 
(motor) is split into separate parallel parts. Examples with static redundancy are two 
servo-valves for hydraulic actuators, [19.22] or three windings of an electrical motor 
(including power electronics), [19.17], see also [19.11]. Within electromotor-driven 
throttles for SI engines, only the slider is doubled to make the potentiometer position 
sensor static-redundant, see Section 20.2. 
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Fig. 19.4. Fault-tolerant actuator: (a) common actuator; (b) actuator with duplex drive 


One example for dynamic redundancy with cold standby is the cabin pressure 
flap actuator in aircraft, where two independent DC motors exist and act on one 
planetary gear, [19.21], see [19.11], 

As cost and weight generally are higher than for sensors, actuators with fail- 
operational duplex configuration are to be preferred. Then, either static-redundant 
structures, where both parts operate continuously, Figure 18.1a, or dynamic redun¬ 
dant structures with hot standby. Figure 18.1b, or cold standby, Figure 18.1c, can be 
chosen. For dynamic redundancy fault-detection methods of the actuator parts are 
required, [19.14]. One goal should always be that the faulty part of the actuator fails 
silent, i.e. has no influence on the redundant parts. 


19.3 Fault-tolerant communication 

As shown in Chapter 19.5 fault-tolerant management systems require also a fault- 
tolerant communication system between several electronic control units, sensors and 
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actuators (nodes). This can be realized by a multiple bus system which has to cover 
hard real-time requirements. At least a dual bus system with two independent buses 
and independent power supplies must be realized. Both buses are then connected 
to all nodes, where several of them are also at least dual. Hence, a multiple access 
distributed real-time system results. 

The CAN-Bus (Controller-Area Network) was developed in 1983 for automo¬ 
biles as a serial bus system with high reliability of data transfer and high flexibility 
and extendibility. It is an event-triggered and therefore asynchronous bus with high¬ 
est priority access, indicated by the nodes identifier. (CSMA/CA: carrier sense mul¬ 
tiple access collision avoidance protocol). Usually only soft real-time requirements 
can be satisfied because the time behavior depends on the nodes. This means that a 
precise time behavior cannot be guaranteed. 

Time-triggered bus systems like the TTP (time triggered protocol) seem to be 
more suitable for the hard real-time requirements of drive-by-wire systems with sam¬ 
pling times around some ms, [19.9]. The nodes obtain certain time slots for their 
access to the bus (TDMA: time division multiple access) and therefore a determinis¬ 
tic behavior. All nodes are designed to be fail-silent. This means that all subsystems 
have to detect their faults in value and also in time and to switch into a passive state. 
Means to guarantee the exchange between fail-silent components are, e.g. compos- 
ability, periodic data transfer, fast fault detection, global clock synchronization. For 
more details see [19.16], [19.31], [19.25], [19.9], 


19.4 Fault-tolerant control systems 

For automatically controlled systems, the appearance of faults and failures in the ac¬ 
tuators, the process and the sensors will usually affect the operating behavior. With 
feedforward control, generally all small or large faults influence the output variables 
and therefore more or less the operation. However, if the system operates with feed¬ 
back control, small additive or multiplicative faults in the actuator or process are in 
general covered by the controller, because of the usual robustness properties. The 
controller can even be made very robust with regard to some known smaller changes 
or faults in the actuator or the process. But this means a trade off between good 
control performance and robustness against faults. This robustness property is a pas¬ 
sive controller fault-tolerance because no active measures are undertaken. However, 
additive and gain sensor faults will immediately lead to deviations from the refer¬ 
ence values. This holds also for feedforward control. For large changes in actuators, 
process and sensors, which exceed the robustness properties, the dynamic control 
behavior becomes either too sluggish or too less damped or even unstable. Then an 
active fault-tolerant control system is required to save the operation. Figure 19.5. 

Active fault-tolerant control systems consist of fault-detection methods, a de¬ 
cision method and a reconfiguration mechanism with the goal that the operating 
behavior is hold in an acceptable way, compare the supervision loop. Figure 2.4. 
Depending on the fault, the fault-tolerant system may change to 



• controller structure and parameters; 

• used actuators; 

• used sensors 
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taking into account a degradation of the normal performance. Of major importance 
is that severe faults are detected fast and reliable and the reconfiguration is settled in 
very short time. This imposes high requirements on the real-time capabilities. 



Fig. 19.5. Fault-tolerant control system: faults in actuators, process and sensors are detected 
and lead to a reconfiguration of the feedback and feedforward controller 


Early publications on fault-tolerant control appeared especially for aircraft, e.g. 
[19.26], [19.27], [19.18], [19.3], [ 19.4], [19.29] or space-craft, [19.7], [19.2], A more 
recent summary of reconfigurable flight control is given in [19.1] and [19.8], see also 
[19.28]. An application to fault-tolerant lateral control shows [19.30]. 

A survey on fault-tolerant control in general is given by [19.23]. Four areas are 
considered like fault detection, robust control, reconfigurable control and supervi¬ 
sion, which manages the fault decision and selects the control configuration. Passive 
fault-tolerant control with fixed, robust controllers and active fault-tolerant control 
with variable structure, based on fault-accumulation, is distinguished. However, be¬ 
cause of the many possibilities for the design and the individual practical require¬ 
ments, the author states that only some structures and approaches could be described. 

A further literature review on reconfigurable fault-tolerant control is [19.32]. 
They classify the literature according to control algorithms (e.g. pre-computed or on¬ 
line redesign) and application areas, like safety-critical, life-critical, mission-critical 
and cost-critical. 

The stability analysis includes three stages, the fault-free period, the transient- 
period during reconfiguration and the steady-state after reconfiguration. Of further 
importance are the constraints in inputs and states, the uncertainties of FDD, the 
delay of reconfiguration, closed loop identification and real-time issues. The authors 
conclude that most publications focuss mainly on algorithmic design, neglecting the 
overall architecture and technical platform. Hence, this area of fault-tolerant control 
is in early state of development and needs more systematic research and practical 
realizations. 
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The following chapter describes the tasks of an automatic fault-management sys¬ 
tem, taking into account all components of control systems. The considered fault- 
tolerant control systems discussed in this chapter can be considered as a subset of a 
general fault-management system. 


19.5 Automatic fault-management system 

After considering fault -tolerant sensors, fault-tolerant actuators, fault-tolerant com¬ 
munication channels and fault-tolerant controllers, these different fault-tolerant com¬ 
ponents can now be brought together, resulting in an automatic fault-management 
system which operates online and in real-time. This leads to a scheme depicted in 
Figure 19.6. 

An automatic fault-management system consists of: 

1 ) fault-tolerant actuators'. 

• redundant identical or diverse actuators; 

• actuators with inherent fault tolerance; 

• actuator reconfiguration module. 

2) fault-tolerant sensors: 

• redundant identical or diverse sensors; 

• sensors with inherent fault tolerance; 

• virtual sensors based on analytical redundancy; 

• sensor reconfiguration module, 

3) active fault-tolerant controllers: 

• redundant identical or diverse controller hardware; 

• redundant diverse controller software; 

• different controller structure and parameters 

- pre-designed for a priori-known faults; 

- redesigned or adaptive after fault detection 

4) fault-detection module: 

• normal closed-loop operating signals are used to detect and isolate faults in 
the components (parity equations, observers, parameter estimation); 

• test signals are introduced either periodically or on request to improve fault 
detection and, if required, fault diagnosis; 

• indication of the degree of impairment and degree of safety criticality. 

5) fault-management module: 

• decisions based on fault detection with an indication of the degree of impair¬ 
ment and effect on safety of the components; 

• reconfiguration strategies with 

- hard or soft reconfiguration; 

- change of operating conditions (setpoints, process performance); 

- closed loop or open loop (feedforward) operation. 
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The scheme in Figure 19.6 is an example with two manipulated and two con¬ 
trolled variables. If the normal actuator 1 fails, e.g. be getting stuck, another actuator 
2 replaces its functions. This can be a second actuator of the same type or another 
actuator with a similar manipulation effect on the control variable (The passenger 
cabin pressure outflow valve is an example of a duplex actuator, see [ 19.20], [19.11]. 
Certain additional control surfaces for aircraft are designed as other, redundant actu¬ 
ators, e.g. ailerons or rudders. 

In the case that the fault-detection system detects that sensor 1 has a large fault or 
even fails totally a second sensor of same type is switched or some analytical redun¬ 
dancy with other sensors is used to generate a virtual sensor output 2, as described 
in Section 19.1. (Examples are the electrical throttle with a double potentiometer, 
see Chapter 20, or the model-based calculation of the yaw rate of automobiles from 
the lateral acceleration and wheel speed sensors, [19.12] or a horizontal and vertical 
gyro for the bank angle of aircraft, [19.23]. 

Depending on the reconfigured actuators or sensors, the controller structure 
and/or controller parameters of the fault-tolerant controller have also to be recon¬ 
figured. 

The structure of Figure 19.6 also holds for faults in the process itself. If then 
the other actuator 2 or the other sensor 2 can be used to maintain the operation, the 
reconfiguration just selects the actuator-sensor configuration, adjusts the controller 
2 and the reference variable u >2 accordingly. An example is a fluid 1 / fluid 2 heat 
exchanger: The outlet temperature of fluid 1 can be manipulated by changing the 
flow of fluid 2 instead of the temperature of fluid 2, if the plant allows. 

The controller structure and parameters have to be adapted to the new process 
behavior. If the transfer behavior of the reconfigured actuator-process-sensor-system 
is known in advance, preprogrammed controllers have just to be switched. If it is 
not known, selftuning or adaptive control algorithms could be used. However, this 
adaptation must be supervised and properly excited with perturbation signals, see 
[19.13], what can be a problem, if a very fast recovery is required. 

The task of the fault-detection module is to detect faults in all components, like 
actuators, sensors, controllers and the process as early as possible, A diagnostic capa¬ 
bility is not necessarily required, because it is mostly enough information for the re¬ 
configuration to know if the actuator or sensor has failed, independent on the causes. 

It should be mentioned, that in the case of closed-loop control fault detection 
must be made under closed-loop conditions with all the problems discussed in Chap¬ 
ter 12. Because of the danger of a reconfigured replacement controller not function¬ 
ing as expected with a replacement sensor, it is sometimes better not to reconfigure 
an alternative closed-loop control, but to apply a feedforward control without an out¬ 
put sensor substitute. This may result in a loss of control performance, but instability 
is avoided. This is, for example, used in engine control. In the case the oxygen-sensor 
(1-sensor) fails, a stoichiometric air/fuel ratio is maintained, based on air-flow mea¬ 
surement and the setpoint of injected fuel mass. 

Faults in the controller hardware or software can be detected as described in 
Chapter 12. Then new controllers as described above are applied or feedforward 
control is used. 
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The discussion on automatic fault management shows that there are many differ¬ 
ent possibilities. Therefore it is difficult to treat applicable methods generally and it 
is recommended to consider concrete cases. 

An experimental investigation of fault detection in closed loop and reconfigura¬ 
tion to a redundant sensor is shown for the electrical throttle valve actuator in [19.24] 
and [19.15], 


19.6 Problems 

1) How can a fault-tolerant temperature measurement system be built with two sen¬ 
sors (thermocouple and resistance thermometer) and three sensors of the same 
type? 

2) What are the differences of the degradation steps: fail-safe, fail-operational, fail- 
silent? Take an electrical driven elevator as example and explain the three steps 
for the electrical drive system, the position sensors and the control unit. 

3) Design a fault-tolerant management system for the careful closing of elevator 
doors. 

4) Describe the possibilities for the construction of fault-tolerant drives with two 
electrical motors. 

5) Which kind of fault-tolerance possess hydraulic brakes for passenger cars? What 
are the degradation steps? How many faults can be tolerated? What kind of faults 
may happen? Which information is received by the driver after a fault? 
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Fault detection and diagnosis of DC motor drives 


The theoretically developed methods for fault detection and diagnosis require an ex¬ 
perimental testing with different kinds of technical processes. Most of the described 
methods assume ideal situations, as, for example, linear behavior or specific struc¬ 
tures of nonlinear processes, precise measurements, small disturbances, stationary 
stochastic disturbances, constant parameters or open loop operation and modelling 
of faults. However, in practice frequently some of the simplifying assumptions are 
violated. Therefore it is of interest how robust the treated methods are with regard to 
these violations. As already discussed in Chapter 13, the suitability of the different 
methods depends on the behavior of the processes and real faults. Therefore some 
applications for two DC motors, a circulation pump and an automotive wheel sus¬ 
pension are shown in the following chapters, highlighting the advantages and disad¬ 
vantages of the applied detection methods. Many more case studies and applications 
on, e.g. electrical, pneumatic and hydraulic actuators, AC motors, pumps, machine 
tools, robots, heat exchangers, pipelines, combustion engines and passenger cars 
will be treated in another book, [20.5]. 


20.1 DC motor 

20.1.1 DC motor test bench 

A permanently excited DC motor with a rated power of P = 550 W at rated speed 
n = 2500 rpm is considered, [20.3]. This DC motor has a two pair brush commu¬ 
nication, two pole pairs, an analog tachometer for speed measurement and operates 
against a hysteresis brake as load, see Figure 20.1. The measured signals are the ar¬ 
mature voltage U a, the armature current I a and the speed o>. A servo amplifier with 
pulsewidth-modulated armature voltage as output and speed and armature current as 
feedback allows a cascaded speed control system. The three measured signals first 
pass analog anti-aliasing filters and are processed by a digital signal processor (TXP 
32 CP, 32 bit fpt, 50 MHz) and an Intel Pentium Host PC. Also the hysteresis brake is 
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Fig. 20.1. DC motor test bench with hysteresis brake: a. test bench; b. scheme of equipment 


controlled by a pulsewidth servo amplifier. Usually such DC motors can be described 
by linear dynamic models. 

However, experiments have shown that this model with constant parameters does 
not match the process in the whole operational range. Therefore, two nonlinearities 
are included so that the model fits the process better. The resulting first-order differ¬ 
ential equations are: 

La i A (t) = - R a IAif ) - co(t) - K B \a>(t)\ I A (t) + U*(t) (20.1) 

J 6j = 'T I A (t) — Mp\ co(t) — Mpo sign ( co(t )) — Mp(t) (20.2) 

Figure 20.2 depicts the resulting signal flow diagram. The term KB\co(t)\I A (t) 
compensates for the voltage drop at the brushes in combination with a pulsewidth- 
modulated power supply. The friction is included by a viscous and a dry friction term 
Mpico and M /.’osign(rn). The parameters are identified by least-squares estimation 
in the continuous-time domain. Table 20.1 give the nominal values. Most of them 










20.1 DC motor 371 


(Ra, 'P, Kb, Mp i, Mpo) influence the process gain, and the other two (La- J ) the 
time constants. The data is measured with a sampling frequency of 5 kHz, and state 
variable filtered by a fourth-order lowpass-filter with Butterworth characteristic and 
a cut-off frequency of 250 Hz. 

Table 20.1. Data of the DC motor 



Fig. 20.2. Signal flow diagram of the considered DC motor 


20.1.2 Parity equations 

For the detection and isolation of sensor (output) and actuator (input) faults a set of 
structured parity equations with state-space models according to Section 10.2.1 is 
applied. 

As the differential equations (20.1) and (20.2) are nonlinear, the design procedure 
for a linear parity space cannot be applied directly. But defining — K A\a>(t)\ I a as 
voltage input Ua and Mpo sign (a>) as load input Ml leads to a linear description. 
The linear state-space representation then becomes 
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A corresponding signal flow diagram is depicted in Figure 10.8. 

An observability test reveals both outputs {Ia and w) can also observe each other. 
This is a precondition for a parity space of full order (here: 2). Then, W, (10.52), is 
chosen such that a set of structured residuals is obtained, where residual r\ it) is 
independent of MlC), r 2 it) of UaC), r 2 it) of w(t) and i^it) of Ia(J), see also 
[20.3] , [20.7], [20.1] 
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with a = fl' 2 T- Ra Mp\ 

P = La Mp\ + J Ra 

The residuals then follow as: 


(20.4) 


r\(t) = L a IaO) + Ra I Ait) + 'P co(t) - U A it) 
r 2 (t) = J wit) - ^ I Ait) + Mpi co(t) + M L (t) 
r 3(0 = J La i aL) + ( La Mp\ + J Ra) i A it)+ , 

Oh 2 + RAM Fl )lA(t)J U A (t) ~ Mp\ U A (t) - ^ M L (t) 
i'i(t) = J La co it) + {La Mp\ + J Ra) <i>(t) + (T 2 + Ra Mp\) co{t) 
U A (t) + L a M L (t) + Ra M L (t) + R A M L (t) 


(These residuals correspond to Example 10.3, with r i = —: r 2 = — rj ;?'3 = 
—r(;r 4 = — r' 2 , where rj is the residual r\ in Example 10.3.) The same residual 
equations can be also obtained via transfer functions as described in Example 10.3. 
If an additive fault of the measured signals and Ml occurs, all residuals except the 
decoupled one are deflected. The scheme of the structured residuals is not touched 
by the compensation for the nonlinear voltage drop of the brushes, as its magnitude 
is small enough. Two parameters R A and Mp\, however, depend on the present 
motor temperature. The behavior of Ra and its effect on residual r\ is depicted in 
Figure 20.3. Therefore, the use of adaptive parity equations improves the residual 
performance, see Section 10.5. 

The residuals are now examined with regard to their sensitivity to additive and 
parametric faults. As i\ and r 2 comprise all parameters and all signals, it is sufficient 
to consider only these two, although ?'3 or /• 4 can also be taken. From (20.5) and 
(10.82) it yields 

n (/) = -A L a iAit) - A R a I A (t) + A ^ co(t) 

+La A /yi(t) + Ra A I A it) + 'P A u>(t) — A UA(t) 
r 2 (t) = —A J wit) + A 'T lA(t) — A Mp\ w{t) 

+ J A wit) — 'FA I A it) + Mpi A wit) - A M L it) 


( 20 . 6 ) 
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Fig. 20.3. Influence of the motor temperature on resistance R A and residual r x 


In the presence of residual noise, e.g. of r\ with a magnitude of about 1 [V] and an 
armature current of 3 [A], a resistance change must be at least 0.3 | ] in order to 

deflect the residual significantly. Therefore, the two linear parameters R A and Mpi 
are selected to be tracked according to the single parameter estimation, see Section 
10.5. X is chosen to be 0.99. 

20.1.3 Parameter estimation 

The parameter estimation is based on the two differential equations (20.1), (20.2) in 
the simplified form of 

IaU) = -01 I A (t) - e 2 w{t) - 0 3 U A it) (20.7) 

wit) = -04 I Ait) - 05 0 ){t) - 06 M L it) (20.8) 

with the process coefficients 

0! 1 02 04 1 05 

Ra = t-; La = —; = ^- and 'T = J = —\ M FX = 4- (20.9) 

03 03 03 06 06 06 

Applying the recursive parameter estimation method DSFI (Discrete Square Root 
Filtering in Information Form), [20.6], with forgetting factor X = 0.99 yields the 
parameters 0,-. Then all process coefficients can be calculated with (20.9). Experi¬ 
mental results with idle running (Ml = 0) resulted in standard deviations of the 
process coefficients in the range of 2% < og < 6.5 %, [20.3]. 

20.1.4 Experimental results for fault detection 

Based on many test runs, five different faults are now selected to show the detection 
of additive and multiplicative faults with parity equations and recursive parameter 
estimation, [20.2]. The time histories depict the arising faults at t = 0.5 s. The 
faults are step changes and were artificially produced. Figure 20.4 shows the para¬ 
meter estimates and the residuals of parity equations. The residuals are normalized 
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by division through their thresholds. Therefore, exceeding of 1 or —1 indicates the 
detection of a fault. In the cases a) to d) and f) the DC motor is excited by a Pseudo- 
Random-Binary-Signal (PRBS) of the armature voltage Ua which is a requirement 
for dynamic parameter estimation, as shown in Figure 20.4f. In case of e) the input 
is constant. The results can be summarized as 

a) A sensor-gain fault of the voltage sensor Ua leads as expected to a change of 
residual 1 (and 3, 4) but not of residual 2, which is independent on Ua- The 
parameter estimates show (incorrect) changes for Ra , La and T, because a 
gain of the voltage sensor is not modelled. 

b) An offset fault in the speed sensor to leads to a change of the residuals r$, r\ and 
/' 2 , but /’3 remains uneffected, because it is independent on to. The parameter 
estimate of shows an (incorrect) change. 

c) A multiplicative change of the armature resistance Ra yields a corresponding 
change of the parameter estimate Ra - However, the residuals increase their vari¬ 
ance drastically and exceed (incorrectly) their thresholds. 

d) A change of the ratio of inertia is correctly given by the parameter estimate J. 
But all residuals, except r\, exceed their thresholds by increasing their variance. 

e) The same fault in Ra as in c) is introduced, but the input Ua is kept constant. 
The parameter estimate Ra does not converge to a constant value and the parity 
estimation r\ and r 4 change their mean, however, with large variance. 

f) A brush fault leads to an increase of Ra and La but not of T. The residuals 
show an increase of the variance. 

Table 20.2 summarizes the effects of some investigated faults on the parameter 
estimates and parity residuals. 

These investigation have shown: 

1) Additive faults like the offsets of sensors are well detected by the parity equa¬ 
tions. They react fast and do not need an input excitation for a part of the faults. 
However, they have a relatively large variance, especially if the model parame¬ 
ters do not fit well to the process; 

2) Multiplicative faults are well detected by parameter estimation, also for small 
faults. Because of the inherent regression method the reactions are slower but 
smoothed. But they require an input excitation for dynamic process models. 

Therefore, it is recommended to combine both methods, as shown in Section 
14.3, Figure 14.1. The parity equations are used to detect changes somewhere in the 
process and if the fault detection result is unclear a parameter estimation is started, 
eventually by a dynamic test signal for some seconds. If the motor operates dynami¬ 
cally anyhow (as for servo systems and actuators) then the parameter estimation can 
be applied continuously, but with a supervision scheme, see [20.6]. 

[20.3] has shown that a considerable improvement can be obtained by contin¬ 
uously estimating the armature resistance with a single parameter estimation using 
parity equations in order to reach the temperature dependent resistance parameters, 
[20.4], Furtheron, adaptive thresholds are recommended, to compensate for model 
uncertainties, see Section 7.5. 
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Table 20.2. Fault-symptom table for the fault detection of a DC motor with dynamic input 
excitation 1 / 4(0 in form of a PRBS-signal. + positive deflection; ++ strong positive deflection; 
0 no deflection; — negative deflection;-strong negative deflection; ± increased variance 

symptoms 

faults [parameter estimation I parity equations 
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20.1.5 Experimental results for fault diagnosis with SELECT 

The model-based fault-detection system with parity equations and parameter estima¬ 
tion is now the basis for the fault diagnosis procedure, [ 20 . 1 ]. 

a) The used symptoms 

To diagnose the faults, altogether 22 symptoms are created: 

• Windowed sums of the absolute values of the three measured signals f/jj, / 4 , a >; 

• Mean values and standard deviations of four residuals: F \,_ F 4 and a r \ .< 7 ,- 4 ; 

• Eight parameter estimates. Symptoms are the deviations of the current values - 
results of the estimation - from the nominal ones. They are normalized to the 
nominal values. For the rotor resistance Ra this is A R a i = (RA "°' n ■ RA - es ' \ 

J^A,nom. 

The index 1 denotes that the estimation was carried out using the first parity 
equation. Similarly, A Ra 4 , A Lai, A Laa, A J 2 , A A Mp\ 2 , and A Mp\->, 
are computed; 

• Additionally, two symptoms judge the quality of the estimation. They describe 
the variance of an estimated parameter during a recursive estimation. This vari¬ 
ance can give a good indication whether the structure of the estimation equation 
is valid. A structural change of the system will result in a bad estimation result 
where the recursively estimated parameters fluctuate significantly. Two parame¬ 
ter estimations were chosen: and Mp\. Their estimation variances are denoted 
by Vest.,'f 3nd Oest.,MF 1- 
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The symptoms serve to differentiate between 14 fault situations that can artificially 
be introduced on the test rig. 

The DC motor diagnosis was performed by learning a SELECT tree from exper¬ 
imentally gained fault data. For the fault cases, typically 10-50 test cycle measure¬ 
ments for a parameter estimation were performed. The residuals were computed from 
the test runs. That way, each test run results in one data point in the symptom space. 
The membership functions were created with the degressive fuzzy-c-means method. 
To utilize a maximum of transparency and create a highly interpretable system, prior 
knowledge was used to structure the diagnosis system. 

b) Incorporation of structural knowledge 

In most applications, a certain amount of knowledge about the symptom behavior is 
present. Even if exact values for thresholds etc. are not known, there usually is some 
insight into the process like physical understanding of similar faults or similar effects 
of faults on certain symptoms. For the DC motor, this could be as simple as to use 
the windowed sums of the signals in order to to detect a broken sensor cable. This 
information is quite obvious, but its benefits are sometimes neglected, if a diagnosis 
system is designed with the aim to be solely learned from measured data. Hence, the 
task could be simpler if the designer used this information from the beginning. 

Furthermore, the selection of the symptoms for the diagnosis becomes a matter 
of robustness. Some symptoms are affected by faults for which they are not an appro¬ 
priate indicator. In an experimental environment, it is virtually impossible to gather 
enough measurements to adequately reflect every influence. Especially changes in 
the environmental conditions and long time changes due to wear are hardly captured 
in a limited time frame. This leads to diagnosis systems that work well under the 
experimental conditions but fail otherwise. The diagnosis of a fault should therefore 
be based on the appropriate subset of all available symptoms. Only the relevant ones 
should be selected. 

Often, different faults can be categorized into larger groups if their effects on 
the process are similar. It is then advantageous to find a classification system for 
the larger groups first and later separate within them. This leads to the concept of a 
hierarchical diagnosis system. 

Overall, it is proposed to use prior knowledge to structure the diagnosis system. 
The designer builds groups of faults and identifies the corresponding relevant symp¬ 
toms to first differentiate between and later within them. The exact decisions can be 
found automatically if enough measured data is available. 

If the set of all different fault situations Fj is denoted by 

F = {F u F 2 ,...F r } (20.10) 

and the available symptoms given by 

S = {J1,J 2 ,...S,} (20.11) 

one can form meta-classes C,-, i = 1... m with 
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T = C\ U C 2 U ... U C m (20.12) 

In the DC motor diagnosis, for instance, such a meta-class is given by all faults on 
the mechanics of the motor. Such a hierarchy based on meta-classes requires at least 
q = m + r decisions dj , j = 1... q assumed that no C,- is a single-element set. 
Each dj is based on a subset S r jj e S. The SELECT approach will then produce a 
system with p parameters where p is given by 

<? 

p = ^ card (Sdj) (20.13) 

7 = 1 

which is typically much less than a parallel network structure would result in. The 
usually larger number of parameters in parallel network configurations can lead to 
slower convergence and ill-conditioned optimization problems. 

In addition to the structural knowledge, one can incorporate more detailed knowl¬ 
edge into the individual rules if desired. 

c) Results 

A total of 14 different fault situations are applied on the DC motor test bench: 

• Change of rotor inductance or resistance Fra , Fla', 

• Broken rotor wiring (TV); 

• Failure of one the four brushes (Fr); 

• Increased friction in the bearings ( Fp ); 

• Offset on voltage, current or speed sensor signal ( Fo,ua, Fo.ia, Fo,a> 

• Gain change of voltage, current or speed sensor signal (Fg,ua, Fg,ia, /’&>V 

• Complete voltage, current or speed sensor failure ( Fjja , Fja, F 0) ). 

Repeated experiments with different faults were performed using a test cycle. The 
symptoms described in a) were computed for each of the experiments. Overall, the 
training set for the approach consisted of data from 140 experiments. 

Figure 20.5 shows the resulting structure for the DC motor diagnosis. Details 
have been omitted to visualize the concept only. Each shaded block comprises a 
meta-class Cj of faults. Every branching of the tree is connected to a decision dj 
learned with the SELECT approach, i.e. it contains a fuzzy rule. In each meta-class, 
a classification tree decides which individual fault has occurred based on a subset Sj 
of the symptoms. 

The hierarchical decision tree proved to be highly suitable for the diagnosis. It 
achieved 98 % classification rate in a cross-validation scheme. 

The groups of faults have been selected following basic understanding of the DC 
motor supervision concept. Firstly, the three total sensor breakdowns are different 
from other faults due to their strong effects on all symptoms. They form the first 
meta-class C, and can be easily differentiate by the three windowed sums of the 
signals. These three symptoms accordingly form the set Sj. 
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Fig. 20.5. Hierarchical fault diagnosis system. Each block comprises a fuzzy classification tree 


Since the motor can be understood as a combination of an electrical and a me¬ 
chanical component, faults on these two parts were again treated separately, creating 
two more meta-classes, Ci and C 3 . Accordingly, the appropriate subsets of symp¬ 
toms S 2 and <?3 for the diagnosis were selected. Basically, 67 and 63 consist of 
the residuals and parameter deviations connected to the corresponding meta-class. 
The diagnosis of electrical faults, for instance, is not based on parameter estimates 
of the mechanical parameters. Although some electrical faults may have an influ¬ 
ence on the estimates of the mechanical parameters, this influence should not be 
used as the estimates are misleading and not reliable. Hence, 62 does not contain 
AJ 2 , AJ 3 , AMp \2 or AMp\ 2 - 

To give an example of the SELECT approach, the rules for the distinction of the 
electrical faults are given below: 

IF is small AND ALaa is strongly negative THEN Fault Fla 

ELSE IF F[ is small AND 0 , 4 is medium THEN Fault Fra 

ELSE IF r\ is small AND o >4 is large THEN Fault Fp 

ELSE IF ?2 is not small THEN Fault Fo.ia (20.14) 

ELSE IF r\ is small THEN Fault F GJA 

ELSE IF r\ is large AND o es t., i> is not small THEN Fault /-'ox'/i 
ELSE Fault Fg ua 

The relevance indices of the rule premises are not listed here. They also play a role 
for the exact decision boundaries. 

Nevertheless, it is possible to analyze and understand parts of these rules. Clearly, 
the rules reveal the discriminatory power of the first residual, since it was used very 
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often. Other rule premises are also understandable. The change of the rotor induc¬ 
tance is indicated by a strongly negative estimation of this change magnitude. Com¬ 
pare this rule to Figure 20.6a. It shows the values AL 44 for the electrical faults from 
the training set. Clearly, the fault Fla makes a distinct difference. Hence, it makes 
sense to use ALaa to distinguish the fault from the others. The corresponding mem¬ 
bership functions are shown in Figure 20.6b. It must be noted that the experimental 
setup allowed only a fixed deviation of the inductance by -50 % as a fault. That can be 
seen in the estimation result. If, however, also positive changes are to be diagnosed, 
one is able to enhance the rule manually. For instance, one could use 

IF ;”i is small AND A L44 is not small THEN Fault Fla (20.15) 

The corresponding membership functions for A Laa would also have to be adapted 
accordingly to allow a processing of positive values of ^Laa- 



Fig. 20.6. Estimated rotor inductance computed from the fourth parity residual. Apparently, 
most faults influence the result, however, the faulty inductance can most easily be detected due 
to its strong influence: a. estimation results; b. resulting membership functions 


Another interesting observation is the use of a est ^ in the sixth rule of (20.14) to 
distinguish offset from gain faults of the voltage sensor. This can be explained by the 
fact that an offset term in the estimation equation given by an offset fault will change 
the structure of the estimation equation, while a gain will only effect parameters. 
Hence, the normal estimation equation will still be valid in case of gain faults, but 
indicate a problem by a large <3 e st.,'i < for offset faults. 

The system performed well on new experiments, showing the increased robust¬ 
ness through the incorporation of very simple knowledge. Additionally, the system 
has a higher degree of transparency facilitating an adaptation to other motors. The 
diagnostic rules can be extracted and are largely understandable. 

d) Relation to fault trees 

The resulting hierarchical classifier can also be interpreted as a set of fuzzy fault 
trees. If one reverses the order of the structure and traces the decisions leading to a 
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particular fault back through the tree, it is possible to explicitly draw a fault tree for 
each individual fault. Figure 20.7 shows one fault situation (increased friction in the 
motor) as an example. The intermediate steps like “Mechanical Fault” from Figure 
20.7 become events of the fault tree. 



Fig. 20.7. Fault tree for one particular fault extracted from the diagnostic tree in Figure 20.5 


Similar fault trees can be constructed for the other faults. This requires to analyze 
the rule tree and explicitly draw the trees. The resulting set of trees is a relatively 
redundant representation of the fault-symptom relation because the same events are 
used in multiple trees. They are nevertheless very intuitive and serve to understand 
and visualize the functionality of the diagnostic system. 

e) Computational demands 

The most time critical computation of the presented supervision concept is the 
computation of the continuous-time residuals. They require the evaluation of state- 
variable filters that are difficult to be implemented in fixed point arithmetic. If the 
computational resources are limited, also a discrete time form of the residuals is 
possible. This has, for instance, been implemented by [20.9]. 

The diagnosis on the other hand only needs to be evaluated if the fault-detection 
thresholds are violated. It is not time critical and can, for instance, be computed 
as a background job in the motor controller. Similarly, floating-point computations 
such as for the computation of the exponential function in the SELECT neuron can 
always be implemented on a lower precision fixed point controller, for instance, by 
using lookup tables. If the computational time is not critical, one can also implement 
floating point arithmetic on fixed point controllers. Since the time needed for the 
diagnosis compared with the time that typically is needed for personell to reach a 
faulty device, it is obvious that the computational demand should not be really an 








382 20 DC motor drives 


issue. Safety critical measures can be taken as soon as the thresholds are violated 
even before the diagnosis is started. 

Summary 

The detailed theoretical and experimental investigations with the permanently ex¬ 
cited DC motor in idle running or with load by a rotating electrical hysteresis brake 
have demonstrated that it is possible to detect 14 different faults by measurement 
of only three signals and combining the parity equation and parameter estimation 
approach. Furtheron, by applying the self-learning neuro-fuzzy system SELECT all 
faults could be diagnosed with a 98 % correct classification rate. 


20.2 Electrical throttle valve actuator 

Automobile actuators have to operate very reliably under hard ambient conditions 
such as a wide temperature range, vibrations and disturbances in signals and power 
supply. Friction and time-varying process parameters, which are mainly caused by 
temperature influences, make it difficult to fulfil fast and precise positioning con¬ 
trol and high reliability. The investigated throttle-valve actuator is used in ignition 
combustion engines to control the air mass flow through the intake manifold into the 
cylinders. This automotive actuator is embedded in various control systems such as 
engine control, traction control and velocity control, which require fast and precise 
operation. 

20.2.1 Actuator setup 

Figure 20.8 shows a schematic of the actuator. A permanently excited DC motor with 
a gear turns the throttle valve against the closing torque of a helical spring. The motor 
is driven by pulsewidth-modulated (PWM) armature voltage U A , which is measured 
as well as the resulting armature current 1 4 . The angular position 1 px is redundantly 
measured in the range [0 ... 90°] by two potentiometers. 

Theoretical modelling and measurements have shown that the model structure il¬ 
lustrated in Figure 20.9 is a sufficient base for control design and fault detection. The 
gear was modelled as proportional factor with the reduction ratio v, [20.8], [20.9]. 

The inductance was neglected because the electrical time constant is about 1 ms 
and therefore much smaller than the mechanical one. Other parameters are the ar¬ 
mature resistance Ra, the magnetic flux linkage T. the inertia ratio /, the viscous 
friction coefficients Mpi and the spring constant cy . The signal M ext includes the 
spring pretension and external load torques. For the armature circuit then results 

U A (t) = R a I A (t) + 9 oj A (t) (20.16) 

The driving torque of the DC motor is 
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M el (t) = I A (t) (20.17) 

The precise and comprehensive modelling of the mechanical subsystem turned out 
to be rather difficult because of several nonlinear effects. For the positioning above 
the limp-home position (pko the used model is 

v J d>K(t) = 'f' I A (t) -( c F + Mo) (20.18) 

v 

—Mf 0 ) A (t) (pK > <Pko 

<p A {t ) = v <p K {t) (20.19) 

The generation of the symptoms for fault detection is based on the combination of 
parameter estimation and parity equations. 

20.2.2 Parameter estimation 

The parameter estimation is performed by two recursive estimators, one for the elec¬ 
trical and one for the mechanical subsystem. Figure 20.10 shows the overall arrange¬ 
ment. The parameter estimation is performed separately for the electrical and me¬ 
chanical subsystem. The parameter estimation of the electrical part uses the data 
vector and parameter vector as follows 

fU.k) = [I A (k) v <p k (t) 1] (20.20) 

0 T e (k) = [R A * Coe? = [del 0 e2 d e3 ] T (20.21) 

c oe is a constant for modelling additive (sensor) faults. The physical process coeffi¬ 
cients result directly from 


R A = 0 e i; ¥ = 6 el \ Coe = 03 (20.22) 

For the mechanical subsystem it holds 

ti(k) = [IA{k) - <p k (t) - hit) - 1] (20.23) 

cf Mp M 0 
y J v 2 J J v 2 J 

[0ml 0m2 0m3 0m4] (20.24) 

nit) (20.25) 

The physical process coefficients can then be calculated from the parameter estimates 
as follows 

0 ^ 

J = c F = v 2 J 0 m2 ; M f = J 0 m3 ; M 0 = v 2 J 0 m4 (20.26) 

V0 m 



Hence, all physical process coefficients can be determined uniquely. The derivatives 
<i>k{t) and Ipk(t) are generated by a state variable filter, see Figure 20.10. 



process excitation 
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variable filters: Tqsvf = 2 ms; sampling time for parameter estimation: T 0 = 6 ms; corner frequency state variable filter: f g =20 Hz; critical angular 
velocity: Wkrit = 1.5 rad/s 
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The parameter estimation was performed with recursive least squares parame¬ 
ter estimation and UD-factorization, see Section 9.2.3. The parameter estimation is 
switched off for small speed \a>k\ < C 0 k r j t t o avoid problems with strongly nonlinear 
friction effects for slow motion. Figure 20.11 shows some signals for the fault free 
throttle and an increased armature resistance during an excitation with a PRBS at the 
position controller set point. The sampling time was Tq = 6 ms. The three parame¬ 
ters converge very fast to a fixed final value. After introducing a resistance increase, 
parameter estimate R .j increases and the flux linkage T as well as the constant c oe 
remain constant as expected. 



Fig. 20.11. Measured signals and parameter estimates of the electrical subsystem for the fault- 
free case and for an increased armature resistance Ra 


The parameter estimates J, Mq and Mp of the mechanical subsystem converge 
very fast to approximate constant values, only the spring constant cf needs about 3 s 
and shows larger variance, see Figure 20.12. 

The influence of different faults on all parameter estimates is shown in Table 
20.3. Except FI, F2 and FI 1 all faults indicate different symptom patterns. The symp¬ 
tom patterns for the process parameter faults F3 to F8 show better isolability than for 
the sensor offset faults. 
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Fig. 20.12. Signals and parameter estimates of the mechanical subsystem for the fault-free 
case 


20.2.3 Parity equations 

The application of parity equations as described in Example 10.2 and in Section 20.1 
requires a relatively precise process model with constant parameters. The design 
and practical experiences with different approaches of parity space methods for the 
electrical throttle has shown, however, that the residuals show too large variances 
because of model discrepancies and normal disturbances, [20.9]. In contrast to the 
DC motor example in Section 20.1, the electrical throttle includes a mechanical load 
with gear and a helical spring with relatively large pretension. Especially because 
of the friction effects at the bearings, the gear and spring adjustment, it was not 
possible to find an overall model of the mechanical system which can be used in 
parity equations. Therefore, the parity approach was finally limited to the electrical 
subsystem only, leading to the residual 

n (t) = U A (t) - Ra I A (t) - 9 V (fi\ K (t) (20.27) 

where tp \k is the throttle angle of potentiometer 1 (one out of two possible positive 
measurements). This residual will deflect for offset faults of the sensors U A , I a and 
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Table 20.3. Process parameter deviations for different actuator faults: 0: no significant change; 
+ : large increase; —: large decrease 




Features 

parameter estimates 


Faults 

Ra 

vp 

Coe 

j 

CF 

Mp\ 

Mpi 

FI 

incr. spring pretension 

0 

0 

0 

0 

0 

0 

+ 

F2 

deer, spring pretension 

0 

0 

0 

0 

0 

0 

- 

F3 

commutator shortcut 

- 

- 

0 

+ 

+ 

+ 

0 

F4 

arm. winding shortcut 

0 

- 

0 

+ 

+ 

+ 

0 

F5 

arm. winding break 

+ 

- 

0 

0 

+ 

+ 

+ 

F6 

add. serial resistance 

+ 

0 

0 

0 

0 

0 

0 

F7 

add. parallel resistance 

- 

- 

0 

0 

+ 

+ 

0 

F8 

increased gear friction 

0 

0 

0 

+ 

+ 

+ 

0 

F9 

offset fault Ua 

0 

0 

+/- 

0 

0 

0 

0 

F10 

offset fault 1 A 

0 

0 

-/+ 

0 

0 

0 

+/- 

FI 1 

offset fault <px 

0 

0 

0 

0 

0 

0 

-/+ 

F12 

scale fault Ua 

+/- 

+/- 

+/- 

+/- 

+/- 

+/- 

+/- 

F13 

scale fault Ia 

-/+ 

0 

0 

+/- 

+/- 

+/- 

+/- 

F14 

scale fault 1 

0 

-/+ 

0 

-/+ 

-/+ 

-/+ 

-/+ 


cp i and for parameter changes of Ra and \P. (p\ was determined by a state variable 
filter, which was also applied for low pass filtering U ) and I Ait). Figure 20.13 
shows the time history of the residual for different faults. The residuals are normal¬ 
ized to the threshold value. The faults were introduced at time t = Is, showing an 
exceeding of the threshold after about 200 ms. An offset fault of the angle sensor 
only briefly overpasses the threshold ( which is disadvantageous for fault detection). 
The parameter changes result in larger variances of the residual. 

The experiments have further shown, that some parameters change with the tem¬ 
perature and load. In the electrical subsystem are these the resistance Ra and the 
flux linkage *T. They change their values after continuous operation over 30 min by 
+50% and -10%. A modification of the parity equations according to Section 10.5 
can now be used to track the resistance parameter Ra and dependent on this the 
value T. This leads to a smaller variance of residual r\(t) with small additional com¬ 
putational effort, [20.9], 

A further residual /' 2 (f) as difference between the two position sensors is used 
for a sensor fault-tolerant system, described in [20.5]. 

20.2.4 Diagnostic equipment for quality control 

For quality control, e.g. as end-of-assembling-line testing, a special approach was 
developed. Figure 20.14 depicts the equipment, consisting of power electronics, an 
online coupled digital signal processor and a host PC, see Figure 20.14, [20.10]. As 
the electrical throttle is disconnected from the engine and for its control system a 
special test cycle can be applied. 
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Fig. 20.13. Parity equation residual r\(t) for the electrical subsystem for PRBS excitation: a. 
fault free; b. increase of resistance R^ by +20%; c. 20% offset on throttle angle Ad. 
20 % gain change of throttle angle ip l / c 


user interface, visualization, documentation 


power electronics 



Fig. 20.14. Diagnosis equipment for electrical throttles in the frame of quality control 
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Fig. 20.15. Test cycle for the detection of different faults with different methods 


Figure 20.15 shows the test cycle with five different phases requiring a time pe¬ 
riod of 9.4 s. At the beginning of the test the throttle valve is feed forwardly con¬ 
trolled in open loop, to detect basic faults like open armature circuit, short cuts and 
offset faults of the sensors. Beginning with phase 4 the throttle is operating in closed 
loop to test the operating range and some mechanical parameters. During the last 
phase 5 a dynamic PRBS signal excites the position set point and all electrical and 
mechanical parameters are estimated. A fuzzy logic rule-based diagnosis system fi¬ 
nally allows to detect 38 different faults based on 30 generated symptoms. 

Summary 

Overall testing of the electro-mechanical throttle valve by measurement of three sig¬ 
nals needs a combination of methods like limit checking, parameter estimation and 
a parity equation. Parameter estimation gave good results allowing deep fault diag¬ 
nosis. However, parity equations are only suitable for the electrical part due to the 
difficulty to model the mechanical subsystem. Hence, the ability of parameter esti¬ 
mation to reduce modelling errors by the inherent regression made a comprehensive 
fault diagnosis possible. 
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Fault detection and diagnosis of a centrifugal 
pump-pipe-system 


21.1 The pump-pipe-tank system 

In order to develop an online operating fault detection and diagnosis method for a 
centrifugal pump-pipe-tank system over a large operating range, a plant according 
to Figure 21.1 and 21.2 is considered. The pump is driven by an inverter-fed, speed 
variable induction (squirrel cage) motor which is speed controlled by a field-oriented 
controller. 


magnetic valve 
solenoid valve 


inverter 

induction motor 
centrifugal pump 


% 


-tf- 



Fig. 21.1. Centrifugal pump-pipe-tank plant with measurements: AC-motor: Siemens 1 LA 
5090-2AA (norm motor) = 1.5 kW; n\y = 2900 rpm; Frequency converter: Lust MC 
7404; circular pump: Hilge, H max = 130 m; V m ax = 14 m 3 /h; P m ax = 5.5 kW 
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Fig. 21.2. Photo of the investigated centrifugal pump 


21.2 Mathematical models of the centrifugal pump 

The stator current vector I v = I sa + i I s) g is measured and transformed in the refer¬ 
ence frame defined by the rotor flux. 

Ij = Isd + i I sq (21.1) 

which is obtained by using an adequate model, see [21.4], [21.2]. 

The motor torque can then be determined by 

M e , = k T ^ R I sq (21.2) 

where Ary is known from the motor data sheet. 

Further measurements are 
p i pump inlet pressure A p = p 2 — p\ 

P 2 pump outlet pressure A p = pi~ p\ 
co pump speed 
V volume flow 

where H = A p/p g is called the pump head. 

The used mathematical models of the pump have to be adapted to the pump-pipe- 
system. Based on the theoretically derived equations of centrifugal pumps, [21.3], 
following equations are used here 

H(t) = h nn a> 2 (t) - h nv co(t) V (?) - h vv V 2 (t) 

ZJSt\ _ Pl(t) - P\(t) _ A p(t) 

pg 


pg 


(21.3) 

(21.4) 
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J d>(t) = M e i — M t h{t) — Mf(t ) 

(21.5) 

M th (t) = M rhl co(t) V(t) - M thl V 2 {t) 

( 21 . 6 ) 

Mf(t ) = Mf 0 sign a >(t) + Mfi cv(t) 

(21.7) 

A comparison of these theoretically derived equations has shown that, because of the 
flow V is proportional to the speed o> and neglection of the viscous friction in (21.7) 
following simplified relations can be used with A p = pgH 

A p(t) = h nn co 2 (t) - h w aft) 

( 21 . 8 ) 

J io(t) = M e i(t) — Mf 0 — M 2 co 2 (t) 

(21.9) 

The dynamics of the fluid in the pipe is described by 


dV (t) ■ , 

H(t) = a B j’+h rr V 2 (t) 

dt 

( 21 . 10 ) 


with as = 1 IgA (/: pipe length. A: cross sectional area). These models agree with 
[21.1] for a larger pump-pipe-system. Figure 21.3 shows the resulting signal flow 
diagram. 


21.3 Parity equations and parameter estimation 
a) Measurement of /, w, A p, V 
Nonlinear parity equations 

Based on these models and after discretizing, following residuals with nonlinear 
equations can be obtained, compare Figure 21.4 and [21.4], [21.5]: 

• Static pump model (21.8) 

r\ ( k ) = A p(k ) — w\ a> 2 (k) — w 2 co(k) (21.11) 

• Dynamic pipe model (21.10) and further simplifications 

f '2 (k) = V ( k ) — W3 — W4 y/ A p(k) — ws V(k — 1) (21.12) 

• Dynamic pipe-pump model (21.8), (21.10) 

r 3 (k) = V(k) — w 3 — w 4 y [A p(k) - w s V(k - 1) 

Ap(k) = wi co 2 (k) + W 2 co(k) 

• Dynamic inverse pump model (21.9) 


r 4 {k) = M e i — we — vuj a>(k) — w$ co(k — 1 ) 
—w 9 co 2 (k) — wio M e f{k — 1 ) 


(21.14) 
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Fig. 21.4. Residual generation with parity equations for the pump-pipe system 


The residuals r\(k), i' 2 (k) and r^ik) are output residuals which follow by com¬ 
paring the measured Ap(k) and V ( k ) with the corresponding model outputs. How¬ 
ever, i" 4 (k) is an input residual, because M e i is compared with the output of an 
inverse pump model. r 2 (k) and ri(k) include flow sensor dynamics of first order. 
The sampling time is 7 0 = 10 ms. 

The parameters uti,..., twio for the parity equations follow directly from known 
physical data described in the equations above or are estimated, e.g. with methods of 
least squares based on measurements of I sq (t), a>(t), A p(t) and V(t). However, the 
parameters u)j depend, especially for low speed on the operating point. Therefore, for 
each residual a multi-model approach is used. It has turned out, that is is sufficient to 
consider the parameters dependent on the angular speed only. 

To identify the multi-model for the parity equations, the pump system was excited 
by an amplitude modulated PRBS signal over the whole operating range and with the 
local linear model network (LOLIMOT) the parameters Wi{co), i = 1,.... 10 were 
determined using three local models each, see Section 9.3.3. Figure 21.5 shows a 
comparison of measured and reconstructed values with the models. Hence, a very 
good agreement can be stated. 

Following faults were introduced into the pump-pipe-systems: 

• offset sensor faults a>, V, p \, p 2 \ 

• increased resistance by piecewise closing of a valve after the pump; 

• cavitation by piecewise closing a valve before the pump; 

• increased bearing friction by removing grease and introducing iron deposits; 

• defect impeller by closing one channel between two vanes with silicon; 

• sealing gap losses by opening a by-pass valve; 

• leakage between pump and flow measurement. 

Table 21.1 shows the resulting symptoms. The residuals of the parity equations can 
be obtained without input excitation. They indicate, that the sensor offset faults, 
sealing gap losses and increased bearing friction are strongly isolable. However, in¬ 
creased flow resistance, cavitation and impeller defect are either only weakly or not 
isolable. This means that all the faults are detectable, but some of them cannot be 
differentiated. In order to avoid too large thresholds, adaptive thresholds are used. In 
addition to a constant value, the thresholds depend on a high-pass filtered value of 
the speed co, which increases the threshold in case of a speed change, [21.5]. 
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Fig. 21.5. Measured signals of the pump-pipe-system, LOLIMOT model outputs and their dif¬ 
ferences: (a) angular speed; (b) delivery pressure difference; (c) flow rate; (d) torque of AC 
motor 
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During normal operation the nonlinear parity equations with residuals r\ and r\ 
can be applied. In order to obtain a more detailed diagnosis, a parameter estimation 
with an amplitude modulated PRBS, an APRBS test signal excitation of the speed 
has to be started. 

This dynamic excitation allows to estimate the parameters of the models (21.8), 
(21.9) and (21.10) with a recursive least squares methods as described in Section 
9.2, see also [21.2], Section 6.6. The changes of these physically defined parameters 
are given in Table 21.1 and show that all faults are isolable and can therefore be 
diagnosed. 

b) Measurement of /, to 

If the delivery pressure A p and the flow rate V are not measurable, the residual 
can be calculated based on measured speed to and motor current I sq . This allows to 
detect a sensor fault in to and some pump faults. Additional parameter estimation 
enables to determine parameter deviations of J, Mfo and M 2 with (21.8) and to 
isolate some more pump faults. 

Similar results as described above have been obtained by [21.1 ] for a larger pump 
with P = 3.3 kW and V max = 150 m 3 /h and a larger pipe circulation system 
with two heat exchangers. Two different flow meters could be used. This allowed to 
generate six residuals and four parameter estimates. Together with two variances of 
residuals, all together 13 symptoms could be obtained, which enables to diagnose 11 
different faults of sensors, pump and pipe system. 

These symptoms were then used to train 20 fuzzy rules with the SELECT proce¬ 
dure described in Section 17.3.5, yielding a 100 % classification accuracy. 

Summary 

This case study has again shown that (nonlinear) parity equations are suitable for 
some additive faults and that parameter estimation gives much more insight and al¬ 
lows to detect and diagnose especially parametric (multiplicative) faults. The best 
fault coverage is obtained by combining parity equation and parameter estimation. 

Table 21.2 enables to see which faults are only detectable and also diagnosable 
with combined parity equations and parameter estimation. A minimal measurement 
configuration with the torque M = f (/) and speed to allows to detect some few 
faults but not to diagnose them. By adding a sensor for p\ and P 2 , or for A p, many 
more faults can be detected and diagnosed. The additional implementation of a flow 
rate sensor V has little influence on the number of detectable faults, but allows to di¬ 
agnose many more faults. This shows that model-based detection of faults is possible 
with some three to four sensors, but that the fault diagnosis is improved considerably 
by one additional sensor (here the flow rate). 
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Table 21.2. Detectable and diagnosable faults in dependence on the sensors used. Assumed is 
a modern frequency converter that is able to reconstruct the motor torque M without additional 
sensors. Omitted are faults at the frequency converter itself and other electric faults in the motor. 
Faults in parenthesis are difficult to be identified yet not completely impossible (depends on the 
individual setup), [21.1], 


Fault detectable 

j Sensor usage | 

M 

M, CD 

M, CD, P 2 

M, CD, Pi , P 2 

M, cd , pi, P2, V 

Total breakdown 

X 

X 

X 

X 

X 

Defective blade wheel 


(X) 

X 

X 

X 

Incr. shaft or motor friction 



X 

X 

X 

Sensor fault a> 


X 

X 

X 

X 

Sensor fault V 





X 

Sensor fault p\, p2 



P 2 

X 

X 

Decreased flow resistance 


(X) 

(X) 

X 

X 

Increased flow resistance 


(X) 

(X) 

X 

X 

Cavitation through 
pressure reduction 
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Insufficient de-ventilation 
of sensors p\, p 2 



P 2 
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Fault detection and diagnosis of an automotive 
suspension and the tire pressures 


22.1 Mathematical model of a suspension and the test rig 

To perform fault diagnosis either in a service station, for example for technical in¬ 
spection, or in a driving state it is important to use easy measurable variables. If the 
methods should be used for technical inspection the sensors must be easily applicable 
to the car. For on-board fault detection the existing variables for suspension control 
should be used. Variables which meet these requirements are the vertical accelera¬ 
tions of body and wheel, z B and 'iw and the suspension deflection z\y — zb ■ Another 
important point is that the method should require only little a priori knowledge about 
the type of car. 

A scheme for a simplified model of a car suspension system, a quarter car model, 
is shown in Figure 22.1. The following equations follow from force balances 

m B z B (t) = c B (z w (t) - z B (t)) + d B {z w {t) - z B (t)) (22.1) 

mw zw(0 = ~c B {z w (t)-z B (t))-d B (z w (t)-z B (t)) + c w (r(t)-z w (t )) (22.2) 
In this chapter following symbols are used: 


a\,b\ 

parameters of transfer functions; Fw 

wheel force; 

c B 

stiffness of body spring; 

m B 

body mass; 

Cw 

tire stiffness; 

Pw 

wheel pressure; 

d B 

body damping coefficient; 

r 

road displacement; 

fr 

resonance frequency; 

-B 

vert, body displacement; 

Fc 

Coulomb friction force; 

Zw 

vert, wheel displacement 

Fd 

damper force; 

Azwb 


Fs 

spring and damper force; 

= - B — 

zw suspension deflection. 


The small damping of the wheel is usually negligible. A survey of passive and 
semi-active suspensions and their models is given in [22.7]. 

The first results were obtained on a test rig, shown in Figure 22.2 which is 
equipped with a continuously adjustable damper. The damping is controlled by a 
magnetic valve, which opens or closes a bypass continuously. The test rig was con¬ 
structed primarily for investigations on semi-active, parameter-adaptive suspension 
control, [22.2]. 
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Fig. 22.1. Quarter car model 



Fig. 22.2. Quarter-car test rig of IAT, TU Darmstadt: a. scheme; b. photo 


22.2 Parameter estimation (test rig) 


In general, the relationship between force and velocity of a shock absorber is non¬ 
linear. It is usually degressive and depends strongly on the direction of motion of the 
piston. In addition, the Coulomb friction of the damper should be taken into account. 
To approximate this behavior the characteristic damper curve can be divided into m 
sections as a function of the piston velocity. Considering m sections the following 
equation, compare (22.1), can be obtained. 


= —— (iw — zb) + 
m b 


- (zw ~ zb) + 

m b 


1 

- Fc,i 

111 B 


i = 1... m (22.3) 
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Fc.i denotes the force generated by Coulomb friction, d bj the damping coefficient 
for each section. Using (22.3) the damping curve can be estimated with a standard 
parameter estimation algorithm measuring the body acceleration r# and suspension 
deflection zw — -b- The velocity z\y — zb can be obtained by numerical differentia¬ 
tion. In addition either the body mass m b or the spring stiffness c b can be estimated. 
One of both variables must be a priori known. Using (22.1) and (22.2) other equations 
for parameter estimation can be obtained, e.g. (22.4) which can be used to estimate 
the tire stiffness Cw additionally 


~w - Cb = - 


dBj 

m B 


(zw ~ 4 b ) - 


- (~w--b) + 

m B 


—(y-z w )- F Cl i 

C B Cw 


(22.4) 


The disadvantage of this equation is the necessity to measure the distance between 
road and wheel ( r — zw)- This variable is therefore only feasible for technical in¬ 
spection on a test stand, whereas in driving cars the high sensor costs prevent the use 
of this estimation equation. 

Figure 22.3 shows the estimated damping curve for different damper magnetic 
valve currents using (22.3). Because rising damper current opens the bypass the 
damping sinks. The damping curve was divided into four sections, two for each di¬ 
rection of motion. It can be seen that the damping curves at different damper currents 
can be distinguished. Since a worn damper results in a changed curve, a detection is 
possible. In addition, different other faults can be distinguished, [22.2], [22.8], [22.9], 
[22.11], [22.6]. Table 22.1 gives an overview. The influence of these faults on the es¬ 
timated variables is obvious. Hence, there is a different parameter estimate pattern 
for every fault. All parametric faults are strongly isolable, except the sensor offset 
faults. 



Fig. 22.3. Estimated local linear damping characteristics using (22.3) 
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Table 22.1. Influence of faults on the parameter estimates. + increase, — decrease, 0 no 
influence 



Fault 

ds 

Jb+ 

CB- 

C B+ 

F c+ 

F c+ 

Process faults 

Friction + 

0 

0 

0 

0 

+ 

- 


Damping + 

+ 

+ 

0 

0 

0 

0 


Spring stiffness + 

0 

0 

+ 

+ 

0 

0 

Sensor faults 

Offset - b + 

0 

0 

0 

0 

+ 

+ 


Offset (— -_b)+ 

0 

0 

0 

0 

- 

- 


Gain zb + 

+ 

+ 

+ 

+ 

+ 

+ 


Gain ( z w - =b) + 

- 

- 

- 

- 

0 

0 


22.3 Parity equations (test rig) 


Parity equations do not need permanent excitation and require less computational ef¬ 
fort than parameter estimation, but do not give the same deep insight into the process 
as parameter estimation. To combine the advantages of parameter estimation and par¬ 
ity equations it is proposed to supervise the process online with parity equations and 
to perform parameter estimation after a fault is detected, compare Section 14.3 and 
[22.10]. In the following, an example for the detection of sensor faults with parity 
equations is described. 

With the abbreviation 

A ~wb = z w ~ -B (22.5) 

the z-domain transfer function of ( 22 . 1 ) can be calculated 


A z WB (z) 
Gb(z) = .. 7 - 

zb{z) 


b\z 1 

1 + Cl\Z~ l 


( 22 . 6 ) 


which leads to the residual equation with the discrete time k = t / To = 0 , 1 , 2 ,... 
and Tq the sampling time 


r(k) = AzivB(k) - b\ZB(k - 1) + ci\AzwB(k - 1) (22.7) 


The parameters a \ and b\ can be calculated applying the r-transform to (22.1) 
or using discrete-time parameter estimation in a fault free process state. Figure 22.4a 
shows the result with an offset of 0.1 Volts (which is equal to approx. 3 % of the max¬ 
imum value) added to the output of the acceleration sensor at the time t = 5s. Figure 
22.4b gives the result with a sensor gain fault of 20 % starting at t = 5s. In both 
cases the residuals are divided by their thresholds. It is clearly visible that the offset 
fault violates the threshold immediately. However, it cannot be distinguished which 
sensor, zw — zb or zb leads to threshold violation. A sensor gain fault obviously 
only affects the variance of the residual. Hence, the results show that in both cases a 
detection of the sensor fault is possible in principle, if the parameters of the process 
remain constant. Hence, a combination of parameter estimation according to Table 
22.1 and the residual equation (22.7) gives the best fault coverage. Similar results 
can be obtained with the wheel acceleration sensor 'i\y and suspension deflection 
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{zw — zb), [22.2], [22.7]. These sensors are now in series production of semi-active 
shock absorbers. 



Fig. 22.4. Parity equation residual r(k ) for: a. acceleration sensor offset 0.1 V at 5s; b. 
acceleration sensor zb gain offset of 20% at 5s 


22.4 Experimental results with a driving vehicle 

To test various methods in a driving car, a medium class car, an Opel Omega, Figure 
22.5, was equipped with sensors to measure the vertical acceleration of body and 
wheel as well as the suspension deflections. To realize different damping coefficients 
the car is equipped with adjustable shock absorbers at the rear axle, which can be 
varied in three steps. In figure 22.6 the course of the estimated damping coefficients 
as described in Section 22.2 at different damper settings is given for driving over 
boards of height 2 cm, Figure 22.5. 

After approximately 2.5 s the estimated values converge to their final values. The 
estimated damping coefficients differ approximately 10% from the directly mea¬ 
sured ones. In Figure 22.7 the estimated characteristic curves at the different damper 
settings are shown. The different settings are separable and the different damping 
characteristics in compression and rebound is clearly visible, although the effect is 
not as strong as in the directly measured characteristic curve. More results are given 
in [22.1], 


22.5 Shock absorber fault detection during driving 

All the results are carried out by considering real measurements on a highway, [22.1]. 

a) Recursive parameter estimation (RLS) 

Figure 22.8 illustrates the suspension deflection zw — zb, the first derivative of the 
suspension deflection calculated with a state variable filter zw — -b and the body 
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Fig. 22.5. Driving experiment for model validation and parameter estimation 



Fig. 22.6. Estimated damping coefficients for different damper settings (speed about 30 km/h) 



velocity Ai ra [m/s] 


Fig. 22.7. Estimated characteristic damping characteristics for different damper settings 
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acceleration zb for the right rear wheel during a highway test drive. After 30, 60, 90, 
120 seconds a change of the shock absorber damping was made. 



time [s] 


Fig. 22.8. Measured signals on a highway with variation of the damper configuration 


Several estimations have shown that the recursive least squares algorithm (RLS) 
with exponential forgetting factor received very good results. This recursive para¬ 
meter estimation is able to adapt to the different damping settings in about 10 s, see 
Figure 22.9. 

b) Principal component analysis (PCA) 

PCA, see Chapter 13, is first applied to the measured data in a normal situation, i.e. 
the medium damping configuration (30 s < t < 60 s and 90 s < t < 120 s). The 
measurements of all the variables that either characterize or influence the vertical dy¬ 
namics of the wheel suspension system are included in the reference data matrix X, 
i.e. the suspension deflection and body acceleration measured at four suspension cor¬ 
ners, roll angular velocity, pitch angular velocity, yaw angular velocity, longitudinal 
body acceleration, lateral body acceleration and vertical body acceleration. Since the 
PCA is being applied to a dynamic system the data matrix X is extended to include 
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RLS with forgetting factor 
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. 
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Fig. 22.9. Parameter estimate of the damping coefficient with RLS and forgetting factor. 0 s 
< t < 30 s: soft damping configuration; 30 s < t < 60 s and s 90 < t < 120 s: medium 
damping configuration (normal situation); 60 s < t < 90 s and 120 s < t < 150 S: hard 
damping coefficient 


the lagged variables zw(k — 1) — zs(k — 1), zs(k — 1), zw(k — 2) — zs(k — 2), 
zsik — 2) of all four suspensions. The data matrix X thus consists of 30 measured 
variables at time intervals 30 s < t < 60 s and 90 s < t < s 120. When applying 
PCA it turns out that 24 principal components are sufficient to adequately describe 
the dynamic behavior of the car suspension system. 

A change in the damping configuration changes the signals relationships. It is 
therefore expected that the measurements do not project into the same region during 
occurrence of this change. Figure 22.10 shows T 2 statistics 

M f 2 

Tl ( k ) = E 4 

7 = 1 °j 

calculated for the entire data set, compare (13.24). Herewith a? is the variance of the 
/—th principal component. During the periods of abnormal process operation (soft 
and hard damping configuration) the value T 2 statistics only occasionally exceed 
the threshold, thus indicating abnormal behavior (the grey areas in Figure 22.10). 
However, there are no false alarms (the T 2 statistics remains below the threshold 
when the damping configuration is normal). Hence, the parameter estimation gave 
better results than principal component analysis. A summary of various results for 
fault diagnosis of semi-active and active suspension systems is given in [22.3]. 
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Fig. 22.10. T 2 statistical distance (dots) and the threshold (dashed line) for principal compo¬ 
nent analysis (PCA) 

22.6 Tire pressure supervision with spectral analysis 

Using signal spectrum analysis of the measured wheel acceleration 'i\y, a measure¬ 
ment of the spring deflection is not necessary to observe changes of the tire stiffness 
cw due to tire pressure loss. Therefore only the vertical acceleration of the wheel has 
to be measured. Because the frequency range of the body vibrations is significantly 
lower than the range of the wheel vibrations the simplified model in Figure 22.11 
can be applied. 




r ^ C W 

Fig. 22.11. Simplified model to determine the tire stiffness 


The resonance frequency f r of this system is 


fr = 

Via frequency estimation, based on an AR parameter estimation method, see Sec¬ 
tion 8.1.6, this frequency can be estimated. In Table 22.2 the results for the estimated 
frequency and the corresponding tire pressure with test rig experiments are given. 



Amwicw + cb) 


( 22 . 8 ) 
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Table 22.2. Estimated resonance frequency for different tire pressures (test rig) 


Tire pressure [bar] 

L5 

L6 

1.7 

L8 

L9 

2.0 

estimated frequency /,• [Hz] 

12.23 

12.39 

12.56112.7 

12.85 

12.911 


This table reveals that, as expected, sinking tire pressure results in a lower es¬ 
timated resonance frequency. It should be noted, however, that a wide-bandwidth- 
excitation of the system is important for good results. Results in a driving car on an 
Autobahn are shown in Figure 22.12 were obtained. Note that the absolute values 
differ from the values obtained at the test rig due to different vehicle parameters. 

The vertical wheel vibrations were measured with an accelerometer with a sam¬ 
pling rate of 200 Hz. The tire pressure was first set to its correct value of 2.0 bar and 
then reduced to 1.5 bar. Figure 22.12a shows that, although the estimated frequency 
varies in a range of approximately 0.5 Hz, the values for the lower pressure always 
stays below the values for the normal tire pressure. To reduce the influence of the 
road excitation, now the difference of the estimated frequency of the front wheel and 
the corresponding rear wheel is calculated. The results are given in Figure 22.12b. 
Obviously, the variation of the signals is reduced and the margin for the detection of 
an inflation increased. However, by calculating the difference between front and rear 
wheel, only relative pressure changes can be detected. If both tires loose the same 
amount of air, a detection with this method is infeasible. Therefore, a combination 
of both methods may be appropriate to improve the behavior. 



Fig. 22.12. a. Estimated resonance frequency at different tire pressure; b. Difference of the 
estimated frequencies between the front and rear wheel at different tire pressure 


A loss of approximately 0.4 bar can be detected, which makes it possible to detect 
slow punctures with little computational effort. A similar method by using the body 
acceleration zb is described in [22.8], [22.5]. This method, which does not need tire 
pressure sensors, can be combined with observing the wheel speed difference be¬ 
tween the wheels. With decreasing tire pressure the wheel speed increases. A survey 
on tire pressure monitoring with direct and indirect measurements is given in [22.4], 






22.6 Tire pressure supervision with spectral analysis 411 


Summary 

The applications for fault detection and diagnosis of suspensions have demonstrated, 
that parameter estimation is best suited. Several faults in the suspension system can 
be isolated because of their unique patterns of the parameter estimates. Parity equa¬ 
tions enable to detect some faults, like sensor offsets, but not to diagnose them. Com¬ 
bination of both methods is therefore recommended. Tire pressure supervision is pos¬ 
sible with vibration analysis of the wheel acceleration, especially if combined with 
wheel speed differences without measuring the tire pressure directly. 
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Appendix 


23.1 Terminology in fault detection and diagnosis 


The following definitions are the result of a coordinated action within the IFAC Tech¬ 
nical Committee SAFEPROCESS, published in [23.8]. Some basic definitions can 
also be found in [23.13], [23.4] and in German standards like DIN and VDI/VDE- 
Richtlinien, see References at the end of this section. 


1) States and Signals 


Fault: 

Failure: 

Malfunction: 

Error: 

Disturbance: 

Perturbation: 

Residual: 

Symptom: 


Unpermitted deviation of at least one characteristic prop¬ 
erty of the system; 

Permanent interruption of a systems ability to perform a 
required function under specified operating conditions; 
Intermittent irregularity in fulfilment of a systems desired 
function; 

Deviation between a computed value (of an output vari¬ 
able) and the true, specified or theoretically correct value; 
An unknown (and uncontrolled) input acting on a system; 
An input acting on a system which results in a temporary 
departure from steady state; 

Fault indicator, based on deviations between measure¬ 
ments and model equation based calculations; 

Change of an observable quantity from normal behavior. 


2) Functions 

Fault detection: Determination of faults present in a system and time of de¬ 
tection; 

Fault isolation: Determination of kind, location and time of detection of a 
fault by evaluating symptoms. Follows fault detection; 
Fault identifica- Determination of the size and time-variant behavior of a 
tion: fault. Follows fault isolation; 
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Fault diagnosis: Determination of kind, size, location and time of detection 
of a fault by evaluating symptoms. Follows fault detection. 
Includes fault detection, isolation and identification; 

Monitoring: A continuous real-time task of determining the possible 

conditions of a physical system, recognizing and indicat¬ 
ing anomalies of the behavior; 

Supervision: Monitoring a physical system and taking appropriate ac¬ 

tions to maintain the operation in the case of faults; 

Protection: Means by which a potentially dangerous behavior of the 

system is suppressed if possible, or means by which the 
consequences of a dangerous behavior are avoided. 


3) Models 


Quantitative 

model: 

Qualitative 

model: 

Diagnostic 

model: 

Analytical 

redundancy: 


Use of static and dynamic relations among system variables 
and parameters in order to describe system’s behavior in 
quantitative mathematical terms; 

Use of static and dynamic relations among system variables 
and parameters in order to describe system’s behavior in 
qualitative terms such as causalities or if-then rules; 

A set of static or dynamic relations which link specific in¬ 
put variables - the symptoms - to specific output variables 
- the faults; 

Use of two, but not necessarily identical ways to determine 
a quantity where one way uses a mathematical process 
model in analytical form 


4) System properties 

Reliability Ability of a system to perform a required function under 
stated conditions, within a given scope, during a given pe¬ 
riod of time. Measure: MTTF = Mean Time To Failure. 
MTTF = 1 /A; A is rate of failure [e.g. failures per hour]; 

Safety: Ability of a system not to cause a danger for persons or 

equipment or environment; 

Availability: Probability that a system or equipment will operate satis¬ 

factorily and effectively at any point of time measure: 


, _ MTTF 
A ~ MTTF+MTTR 

MTTR Mean Time to Repair 
MTTR = 1 //x; /x : rate of repair 
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ing). Beuth Verlag, Berlin, 1990. 
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DIN 40042 Zuverlassigkeit elektrischer Gerate, Anlagen und Systeme (Reliability 
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VDI/VDE-Richtlinie 3691. Erfassung von Zuverlassigkeitswerten bei Prozessrech- 
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23.2 State variable filtering of noisy signals to obtain signal 
derivations 

Some methods as the parameter estimation for continuous models or residual equa¬ 
tions need the derivatives v(/), v(/),... of the measured signal y (I). If they cannot be 
directly measured they have to be calculated based on the mostly noisy measurement 
of y(t). One way is the numerical differentiation in combination with interpolation 
approaches (splines, Newton’s method). However, this can only be applied for very 
small noise. State variable filters (SVF) as proposed by [23.21], see Figure 23.1, with 
a transfer function 

F(s) = lM = _I_ 

y(s) f 0 + flS + . . . + fn—\ S n 1 + S n 

have proven to yield good results. The state variable filter is a low-pass filter that 
provides the derivatives as well as filters the disturbance signals. With the state filter, 
the input signal u(t) and the output signal y(t) is filtered. The choice of the filter 
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parameters f is relatively free. The design of a Butterworth filter is recommended, 
see [23.16]. A further possibility is the application of finite impulse response filters 
(FIR), where the derivatives of the impulse response of a low-pass filter are convo¬ 
luted with the signal, [23.14], [23.11], see also [23.9]. 


y n f {t) y"\i) y f \t ) y f \t) 



Fig. 23.1. State variable filter 


23.3 Fuzzy logic - a short introduction 1 

Typical human information processing bases on rules that are not precisely formu¬ 
lated. They are built from both quantitative and qualitative experience and not as 
clearly structured as normal computer programs would require it. [23.23] has there¬ 
fore combined approaches of multi-valued logics to the so-called “Fuzzy Logic” for 
the purpose of simulating human reasoning. This allows the processing of human 
experience and knowledge with digital computers. This section will present some of 
the basics of fuzzy logic. After an introduction into the data representation, the basic 
steps of fuzzy reasoning and its application for fault diagnosis will be explained. 

23.3.1 Basics 

The fuzzy logic processes imprecise information with the help of membership func¬ 
tions and if-then rules. This requires certain inference mechanisms, data represen¬ 
tations and logic operators to enable a human-like information processing. Good 
overviews about these structures can be found from [23.24] or [23.10]. The current 
literature on the topic is immense so that a general overview cannot be given here. 
The fuzzy logic approach has grown in popularity from its beginnings in the 70s 
and is today a standard tool for control system designers. It is used for classification 
problems, modelling and control systems in a variety of methods and combinations 
with other tools. 

1 compiled by Dominik Fiissel, [23.3] 
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Fuzzy Sets 

Fuzzy logics can be seen as an augmentation of the classical logic. It relies on the 
use of so-called fuzzy sets. While in classical logic, an element always belongs to a 
set or not 

ci €E zl (B ci ef A. , (23.1) 

fuzzy logic allows a gradual state in between. 

An example can be seen in Figure 23.2. The temperature is assigned to the three 
fuzzy sets “low”, “medium” and “high”. These functions are usually denoted by 
/ jt(T ). It should be stressed that the definition of these linguistic terms is highly appli¬ 
cation specific. In a different problem, a “low” temperature could have a completely 
different physical range. A temperature of 90° C has the following membership de¬ 
grees'. 

lUowt 90°C) = 0.5, Pmediumi 90°C) = 0.5, m g h (90°C) = 0, (23.2) 

The temperature can therefore not be “high” but is at the same time “low” and 
“medium”. This meets the expectations drawn from human reasoning: Such terms 
as “low” or “medium” are not always precisely definable. There is rather a range 
in that the linguistic terms become valid, a range where they are valid and finally a 
range where they end to be applicable. The domain that the fuzzy sets are defined 
above is the complete range of possible outcomes of the variable “temperature” in 
this example. The range where a fuzzy set is higher than zero is also called the area 
of support of the fuzzy set. This term especially applies to multi-dimensional fuzzy 
sets. The membership functions that define the fuzzy sets can be described by alge¬ 
braic equations: 

[1 , 0 < r(80°C 

Mow{T)=\ 1 - (T- 80)/20 , 80°C < T’(100°C (23.3) 

0 , 100°C < T 




boiling point 


0 




0 


80 100 120 
temperature [°C] 


Fig. 23.2. Example of fuzzy sets (left) and the special representation of a crisp data point with 
fuzzy logic by a singleton (right). 


The shape of the membership functions can generally be arbitrary. Typical are 
trapezoidals or triangular shapes. Increasingly popular are Gaussian-shaped func¬ 
tions, sigmoidals and B-splines for fuzzy sets. These functions are continuously dif¬ 
ferentiable which allows them to be easily used in nonlinear optimization procedures. 
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A special case of membership functions are the Singleton functions that represent 
crisp data values in the context of fuzzy logic. Figure 23.2 gives an example where 
the fuzzy set is defined as: 


Rboilingpoint (R) 


M , T = 100°C 
jo ,7 / 100°C 


(23.4) 


The membership function value is one if the temperature equals 100°C, and other¬ 
wise equal to zero. 

The membership functions are defined to have a range in the interval 0</r < 1 
and are usually designed so that their sum equals unity at any value of the input 
domain. This allows an interpretation as probabilities. The combination of all fuzzy 
sets over a data range establishes a fuzzy variable (“temperature”). The fuzzy sets 
represented by membership functions (“low”, ...) are named attributes, terms or 
labels. 


Inference with If-Then Rules 

The structure of a typical fuzzy diagnosis system is composed of a set of if-then rules. 
If multiple fault situations are to be distinguished, one will have a structure as can be 
seen in Figure 23.3. The inputs are the symptoms Sj. 

The aim of the fuzzy diagnosis is now to implement linguistic rules R of the kind 

R : IF <£{(...),..., (s, is A n ),...,(...)}) THEN {fj is A Fl ) (23.5) 

to draw conclusions from the symptoms Si to the fault measures fj. The condition 
(premise) of the rule comprises in general multiple linguistic statements that are 
combined by operators £. An example is the statement “symptom ,V| increased” with 
the linguistic variables “symptom ,V] ” and the linguistic value “increased”. 

The first step towards the rule evaluation is the computation of the membership 
values /i a (si) to the linguistic value A. This is called fuzzification and consists of 
the evaluation of the individual membership functions such as the one in (23.3). 

The next step is the inference. This terms denotes the evaluation of the linguistic 
rules and the combination of the action list of the rule basis to a linguistic conclusion. 
The inference consists of the premise evaluation, the activation and the accumula¬ 
tion. 

The premise evaluation combines the membership degrees of the individual rule 
premise terms. The task is to utilize the linguistic operators £ of the rule to com¬ 
bine the linguistic values. The operators AND, OR and NOT are intuitively un¬ 
derstandable and known from traditional logics. A minimum requirement for their 
implementation in fuzzy logic is that they equal the Boolean operators for the crisp 
membership values 0 and 1. Some of the operators used are the minimum/maximum, 
bounded difference/bounded sum, algebraic product/algebraic sum. In many appli¬ 
cations, however, the choice of the operator is not vital for the function of the fuzzy 
system. A maximum implementation of the OR operator and a minimum of the AND 
is sufficient for many problems. This leads to the following rule fulfillments: 
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Fig. 23.3. Fuzzy logic system for fault diagnosis. 


AND : fi AnB = min (fi A ,n B ) 

OR: /i A \jb = m-dx(nA'HB) (23.6) 

NOT : fi A > = 1 - n A 

The activation is now the application of the rule fulfillments to the rule conse¬ 
quences. Typical methods are a limiting or a scaling (product) of the output member¬ 
ship functions. Possible is further a weighting with a number between 0 and 1. 

The accumulation combines the activated output membership functions for every 
linguistic output variable. This step unites the outputs of individual rules and is done 
by a maximum operation. An alternative approach is a summation of the membership 
functions. The result of the accumulation is then a linguistic output variable as a 
fuzzy set. 

The fuzzy set is transferred into a crisp output value by the opposite step to the 
fuzzification: the defuzzification. Possible methods are the maximum, center of grav¬ 
ity or mean of maximum. An especially simple method is given if output singleton 
sets are used. The methods will then simplify to a weighted sum of the singleton 
values. 
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23.3.2 Simplification for Fault Diagnosis 

For typical fault diagnosis applications, the standard fuzzy system is reduced. The 
main reason for that is the desired output of the diagnosis: Instead of an arbitrary 
value of a continuous variable, the output is a fault measure representing a gradual 
measure for the possibility of the corresponding fault. If the observed symptoms are 
far apart from the linguistically defined pattern, this fault measure will be close to 
zero, whereas a perfect match will yield a fault measure of one. 

This means that the higher the fault measure becomes, the more likely the corre¬ 
sponding fault situation has occurred. The possibility of that event increases with the 
fault measure. Although this concept is very similar to the mathematical probability, 
this notion would strictly speaking be incorrect since no statistical evaluation of the 
events is done. 

The reduction of the rule consequences to a statement which fault has occurred 
can be represented by a singleton value which is scaled by the rule fulfillment. There¬ 
fore, no other output membership functions are necessary and also the defuzzifica¬ 
tion not required. The resulting, simplified fuzzy logic system structure can be seen 
in Figure 23.4. 


IF s 1 ! small AND s 2 medium THEN Fault f 



IF s 3 medium AND s 4 small THEN Fault f 


Fig. 23.4. Fuzzy logic system for fault diagnosis. Apparently, the defuzzification of a linguistic 
variable is not necessary. Instead, a scaling of the singleton yields a fault measure in the range 
0 ... 1 . 


23.4 Estimation of physical parameters for dynamic processes 

23.4.1 Introduction 

Many contributions do exist for the estimation of parameters © of input-output mod¬ 
els from measured input-output signals, e.g. [23.2], [23.19], [23.22], [23.6]. Some 
publications known for the determination of physically defined parameters p of the 
laws which govern the process dynamics are [23.20], [23.17], [23.1 ], see Figure 23.5. 
The following treatment of this problem is according to [23.5]. 
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The physical parameters will be called process coefficients. For usual tasks in 
control engineering, e.g. the design of control systems, knowledge of the process 
parameters 0 is in general sufficient. However, knowledge of the process coefficients 
p is required for the following problems: 

(a) determination of non-measurable coefficients in natural sciences; 

(b) checking of performance data for technical systems; 

(c) supervision and fault diagnosis during online operation of technical processes; 

(d) quality control in manufacturing. 

The determination of process parameters 0 from measured input and output 
signals with parameter estimation methods has obtained a mature status during the 
last 30 years. For linear and some nonlinear processes with stochastic disturbances, 
several methods are known. They result in a good convergence, if the model structure 
fits with the process structure, the input sufficiently excites the dynamics and the 
number of parameters and the number of parameters is not more than about four to 
six for SISO systems. 

The process depends on physical process coefficients according to more or less 
complicated algebraic relations 

© = /(P) (23.7) 



Fig. 23.5. Dynamic process model with input-output model parameters 0 and physical process 
coefficients p 


These relations are known from theoretical modelling. The task now consists of 
determining the physical process coefficients based on measured input and output 
signals u(t ) and y (t). A straightforward possibility is first to estimate the model 
parameters 0 and then to use the inverse relationship 

p=/- 1 (0) (23.8) 

However, the following questions arise: 

(a) Can the process coefficients p be determined uniquely (process coefficient iden- 
tifiability)? 
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(b) Which signals must be measured in order to determine the process coefficients 

P? 

(c) What is the influence of a priori known process coefficients p-, on the variance 
of 0 and p? 

In this context [23.17] considered the identifiability of biological models, especially 
compartmental models, with computer algebra. [23.1] proposed a two-step procedure 
to estimate physical parameters under the assumption that a considerable number of 
physical parameters is known. The unknown parameters are then estimated by using 
a gradient method with the known parameters as boundary conditions. [23.12] gave 
some basic relationships between physically defined process elements and the result¬ 
ing model structure and show that the model parameters are the sums of products of 
process coefficients. 

23.4.2 On the model structure for processes with lumped parameters 

The basic equations of dynamic processes with lumped parameters in the form of 
balance equations, constitutive equations and phenomenological laws can be repre¬ 
sented in a unified form, [23.7], 

It is now assumed that the process is linear and possesses M measurable signals 
r]j(t) and N non-measurable signals For L elements the Laplace transformed 
process element equations become 

M N 

X! Sij rij(s) = h U 0) ' = 
y=i 7=1 

with 

gij € ( 0 , ±1 ,aij,s K ) | 
htj e (0, ±1, fiij , s K ) 

k e {—1,0,1} J 

In matrix notation this is 

G q = H£ 

with 

tj T = Vm] 

S r = £i,?2.1 : n ] 

dim (G) = Lx M; dim (H) = L x N 

It is further assumed that the process element equations are linearly independent. For 
the parameter estimation a model structure with only measurable signals is required. 
The non-measurable signals are eliminated by transforming the matrix H into upper 
triangular form or by subsequent insertion of the single equations. One then obtains 
an input/output differential equation 


1,2 ,...,L (23.9) 

(23.10) 

(23.11) 

(23.12) 

(23.13) 
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a*y (n \t) + • • • + a\y w {t) + a* Q y(t) = b* u(t ) + b* u w (t) + • • • + b* m u (m) (t) 

(23.14) 

However, for continuous-time parameter estimation the following form is required 

a n }’ l ' n \t)-\ -f a\ j (1) (0 + y(t) = b 0 u{t) + bi w (1) (?)-1-f b m u^ m \t) (23.15) 

such that the regression model 

y(t) = ir T (t)0 (23.16) 

with 

f T (t) = [-y (1) (0 ... -y {n) (t) : h(0« (1) (<) ... u M (t)] 

0 T (t) = [0i,0 2 , ... Or] = [ai ... a n : b 0 ... i m ] 

can be obtained. Hence, all parameters in (23.14) have to be multiplied by I /a* } and 
the number of parameters 9j reduces by one. 

In the system of basic equations (23.11) the process coefficients 

V T = [PUP2,---Pl] (23.17) 

appear separately in the original form. The elements of G and H are 

gij = Pgij s K ; hij = p h ij s K k € (-1,0,1) (23.18) 

After transformation into upper triangular form the input-output model appears in 
the last row, [23.12], The model parameters then take the form 

<? l 

9i = n Pv 11 P 2 2 • • • PT +<72 P i 21 pT ■ ■ ■ PT +• • • 

11= 1 V= 1 

Hereby it is assumed that can be represented by 

l 

<=n pv iv 

v= 1 



Q 

®i = C 'A 2/1 

4 = 1 


(23.20) 


Therefore the parameters are algebraic functions of the process coefficients p v . For 
the second order electrical circuit in the Appendix these relations are 
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Pi = Ru pi = Ci\ p 3 = R 2 ; p 4 = C 2 

0t = ai = Zi + -2 + -3 = Ah P 2 + Pi Pa + P\Pa 

d 2 = a 2 = z 4 = pip 2 piPA (23.21) 

0 3 = bi = z 5 = pi Pi Pa 

04 = b 2 = _ 6 + r 7 = p 2 + p^ 


Hence the z /t are abbreviations for the existing products and single values of the 
process coefficients p \,..., p/. 

(23.20) can now be turned into matrix form 



(23.22) 


Generally, first the process model parameters 0, are determined by parameter esti¬ 
mation via the measured signals. This was treated in Section 9.2.5 using (23.16), 
compare (9.116). 

If some parameters 0 " are known, the parameter vector is split 

0 = [O' 0"} T (23.23) 


where 0 ’ are the unknown parameters to be estimated and the signal derivatives with 
known parameters are separated 

y"(t) = y(t) - f" T {t) 0” = f T (t) 0 ' + e(t) (23.24) 


The LS-estimate then becomes 

0 = y " (23.25) 

Simulations with a second order process have shown that a considerable improve¬ 
ment of the convergence is obtained by: 

(i) one parameter is known: a 2 or b\ is known, i.e., parameters with largest variances 
are known; 

(ii) more parameters are known: a 2 or b\ must be contained in the set of known 
parameters. If other parameters are known, they must be known very precisely. 

23.4.3 Calculation of the physical process coefficients 

As the process parameters 0 are nonlinear algebraic functions of the process coef¬ 
ficients p no general applicable solution for the unknown process coefficients p v in 
the form of (23.8) can be given. For models of first or second order, in most cases a 
direct solution can be found. For higher order systems, successive resolution for the 
unknown p v can be tried or the use of computer algebra, [23.20], [23.18]. 
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However, an identifiability condition for the process coefficients can be given 
independently of the solution method. The basic relation (23.22) is written in implicit 
form as 

q = 0-Cz = 0 (23.26) 

where 

2 = £(P) 

The implicit function theorem, [23.15], now states that a necessary condition for a 
solution for p in the neighborhood of the solution p 0 is that the functional determi¬ 
nant 

det Q p ± 0 (23.27) 

where Q p is the functional matrix 

- <hn dqi ... - 

dp 1 dp 1 " ' dpi 

T dd£2 . .. dS jl 

Q ^P dp 2 dp 2 dp 2 

p = = . . (ZJ.ZS) 

dq\ dq 2 ... dqr_ 

- dpi dpi " ' dp/ - 

This implies r = l, which means that the number / of process coefficients must 
be equal the number r of model parameters. In the case of the example in Section 
23.4.4, this gives 

det Q p = Ci Cl 

The process coefficients are identifiable if 

Ci ^ o n c 2 ^ o n r 2 i=- o 

23.4.4 Example: Second order electrical circuit 

The basic equations are, Figure 23.6, 



Fig. 23.6. Electrical second order network with block diagram 
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Usi(t) = RJ(t ) 
hit) = Ciiicdt) 
w ) = hit) + hit) 

U\it) + u 2 it) - Uciit) = o 

Rmit ) = Rihit) 
hit) = CiUciit) 

Uciit) + U R2 it) - Uciit) = 0 

U 2 (t) = Uciit) 

For input U\ it) and output Ui(t) one obtains: 

cii'i it) + o [ / it) + I(t) = b\U\it) + b 2 Uiit) 

cii = R\C\RiCi 

u i = ~\~ RiCi R\Ci 

b\ = Ci + Ci 
hi = RiC\C 2 

The process coefficients follow from 

= a i/bi 

ia\b j — 2ci\b\aibi + b\a\) 

r 2 = - 1 - 1 —— 

biib\ — a\b\b 2 + b^ai) 

Ci = b\/{a\bi— a\bi) 
c _ (^2 ci\b\b 2 + b\a 2 ) 

iaib 2 —b\a 2 ) 

An example for the axis of an industrial robot is shown in [23.5]. 

23.5 From Parallel to Hierarchical Rule Structures 2 

The purpose is to introduce the idea of hierarchical rule bases for classification pur¬ 
poses. These structures are characterized by a sequential, tree-structured rule assem¬ 
bly. They are more easily understood and usually of smaller complexity than the 
traditional, parallel rule based-decision systems. 

To illustrate the concept, simple binary problems will be given. The classification 
problem will be to decide only between two classes, C\ and Ci. There will be no state 
in between the two and neither a fuzzy transition from one to the other. 

In the next section, normal parallel rules will be examined first. This is followed 
by a problem that will be both described in a parallel as well as in a hierarchical 
structure. Using this example, the structure and use of hierarchical tree-structured 
rule bases, including the place of the OR-operator, will be examined. 

2 compiled by Dominik Fiissel, [23.3] 
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23.5.1 Parallel Rule Bases 

Figure 23.7 gives a first example. The classification problem consists of distinguish¬ 
ing two classes, C\ and Ci, based on the given values s\ and .S' 2 . 

Visible is a problem with a grid-like partition of the space spanned by .V| and s 2 . 
This partition is crisp. However, to keep a consistent notation, the wording is taken 
from fuzzy logics: The intervals along the axes are named with An, A 12 ,... where 
this stands for attribute Ay\, attribute A \ 2 and so on. For a better understanding, one 
can also keep the fuzzy descriptions like small, medium, large etc. in mind. But it is 
again emphasized that there is no fuzzy set involved in this example. 



^11 A 12 A lt A 14 



region of class C x 


region of class C 2 


Fig. 23.7. Elementary binary classification problem 


The crisp rule that describes the problem from Figure 23.7 is simple. The rule 
describes only the region of class Ci. If C\ is not detected, the result is automatically 
C 2 . One can for instance imagine C\ being a fault situation, whereas C 2 is the fault- 
free case. The rule is: 

R x : IF is A n AND 5 2 is A 21 THEN class C; (23.29) 

This rule R \ describes the situation completely. A rule like (23.29) is similar to the 
prototype Mamdani fuzzy rule. Indeed, normal fuzzy classifiers are composed of 
similar rules (see Chapter 23.3). 

Typically, classifiers will be built from a set of rules. A second example problem 
is given in Figure 23.8. The corresponding rule base is now: 

R\ : IF Si is A \2 AND s 2 is A 2 \ THEN class C\ 

R 2 : IF H is A 12 AND 5 2 is A 22 THEN class Q (-3-5U) 

Each of the two rules describes one of the shaded rectangular regions. To get the 

result of the rule base for a new data point, one has to evaluate the two rules and 

finally combine their result. This step is called accumulation. 

For classification problems, one finds that the accumulation is simply a union 
of the results of all rules with the same conclusion. As such, it is simply an OR- 
operation. The simplest implementation is the maximum-operator. 

This accumulation combines the degrees of fulfillment of R\ and R 2 . For this 
simple binary example where a data point can only be inside or outside the shaded 
region from Figure 23.8, it is identical with the binary (boolean) OR-operator. 
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The interesting result for classification rule bases is that the accumulation (i.e., 
the combination of the degrees of fulfillment of different rules) is identical to an OR- 
operation in the rule premise. Indeed, one can write the two rules from (23.30) as 
only one: 


R\ : IF (si is A 12 AND s 2 is A 2 1 ) OR 

(ji is A 12 AND is A 22 ) THEN class Ci 1 ; 

That way, the meanings of the two operators become clear: An AND is used to shrink 
the selected region , whereas the OR combines regions in the input (symptom) space 
to describe the complete region belonging to Ci. 


region of class C x 


region of class C 2 

A n A n A i4 s ' 

Fig. 23.8. First example: Binary classification requiring more than one rule. 

23.5.2 Hierarchical Rule Bases 

With the same methodology one can now address the second example. It is pictured 
in Figure 23.9. This time, the class regions have different shapes. In particular, one 
can decompose the shaded region into axis-orthogonal elements. This means that for 
example the upper region of the symptom space depends only on one of the two in¬ 
puts, namely s 2 in this example. Such a situation occurs often: If higher-dimensional 
spaces are considered, a decision might not always depend on all inputs at the same 
time. The true meaning in fault diagnosis applications is that some symptoms are 
irrelevant for certain faults in certain regions of the symptom space. 

The question arises how this can be explained. It essentially reflects the fact that 
the problem is intrinsically of a lower complexity than potentially possible based on 
the given input dimensionality. 

Coming back to the example from Figure 23.9, one easily determines the rules 
similar to (23.30): 


R i 

IF 

S\ 

is 

A ii 

AND 

Sl 

is 

a 22 

THEN class 

Ci 

Ri 

IF 

Si 

is 

A 12 

AND 

s 2 

is 

a 22 

THEN class 

Ci 

Ri 

IF 

Si 

is 

An 

AND 

S 2 

is 

a 22 

THEN class 

Ci 

R 4 

IF 


is 

A 14 

AND 

S 2 

is 

a 22 

THEN class 

Cl 

Rs 

IF 


is 

A 12 

AND 

s 2 

is 

A 2i 

THEN class 

Cl 

Re 

IF 

Sl 

is 

An 

AND 

s 2 

is 

a 22 

THEN class 

Cl 


Again, the individual rule fulfillments must be combined by the accumulation. 
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Fig. 23.9. Second example: Binary classification problem with more difficult shape. 


The example has risen in complexity: Now, six rules are necessary. The prob¬ 
lem is still understandable because it is 2-dimensional and can be visualized eas¬ 
ily. Typical, however, are problems of more than 2 or 3 dimensions. They can not 
be visualized and are difficult to comprehend. A promising approach to enable an 
understanding is then to divide the complex problem into smaller and simpler sub¬ 
problems. This creates a hierarchy that is typically easier to comprehend. 

In this example, one can write the rules in the following form: 

R* : IF 5 2 is A 23 THEN class C l 

R* 2 : ELSE IF ,y, is A 12 THEN class C, 1-5.55) 

They describe the exact same picture from Figure 23.9. 

Figure 23.10 shows the hierarchy of the two rules. The first describes the upper 
shaded part, the second applies only on the lower region. It must be computed only 
if the first rule does not apply. 



■ region of class C, □ region of class C 2 

Fig. 23.10. Sequential decomposition of rules for classification of second example problem. 


This hierarchy of the two rules can be pictured as a tree structure. This is visu¬ 
alized in Figure 23.11. The tree is composed of rule premises at the nodes and the 
conclusions (here class C\ or C 2 ) at the leaves of the tree. To compute the output 
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of the tree according to (23.33), one has to travel the tree downwards from the root 
(at the top) down to the leaves. Finally, the rule fulfillments at the leaves must be 
accumulated using the OR-operator. 



Fig. 23.11. Tree representation of sequential rule base. 


But how can the large complexity reduction of (23.33) compared to (23.32) be 
explained? There are essentially two reasons for the lower complexity: 

1) The use of the hierarchy expressed by the “ELSE” in the second rule of (23.33) 
essentially hides complexity. It comprises the opposite of all rules above. In par¬ 
allel notation, one would have to write: 

R[ : IF S 2 is A 23 THEN class Ci 

R' 2 : IF NOT (.s' 2 is A 23 ) AND ,s, is A 12 THEN class C, 

2) Rules with a premise that does not contain a certain input reflect the indepen¬ 
dence of the rule from this input. With these incomplete premises, a set of rules 
is combined. In the example, R\ ... R 4 from (23.32) are replaced by R* from 
(23.33). In the same way, R$ and R ( , collapse to /?*. 

It should be noted that for this example also the following rule tree can be used: 

R\ : IF s 2 is A \2 THEN class C\ 

R 2 : ELSE IF ,s 2 is A 23 THEN class Ci 

Analog to Figure 23.10 one can picture a similar rule sequence. The result is the 
same but the decomposition with the rules different. 

The example shows which potential of complexity reduction can be found in hi¬ 
erarchical structures. However, not in all cases a complexity reduction is guaranteed. 
The worst case would be a pattern that resembles a chess board in two dimensions 
with the two class regions alternating. Such a problem is not efficiently handled with 
tree structures. A parallel rule base would be equally appropriate for this problem. 

With the SELECT method that is described in Section 17.3.5, a neuro-fuzzy sys¬ 
tem is given that works with the hierarchical rule structure presented. However, dif¬ 
ferent from the binary example in this section the SELECT method will use the 
following features: 
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• There will be fuzzy transitions between the individual class regions. 

• The AND-operator will be a continuous fuzzy-AND neuron. 

• The resulting decision boundaries will not always be strictly axis-orthogonal. 

• The rule fulfillments will have a continuous value (output membership) between 
0 and 1. Therefore, the complete rule structure will always have to be computed. 



Concluding remarks 


This book treats basic methods for fault detection and diagnosis based on measured 
process signals. Various approaches of signal model and process-based model meth¬ 
ods are described, and use of examples is made. Furtheron, basic structures of fault- 
tolerant systems are given. Some application examples show experimental results 
with different methods. 

In order to meet process specific properties, like static or dynamic behavior, in¬ 
put excitation, precision of modelling and practical requirements like computational 
expense, sampling time, diagnosis depth and costs appropriate fault detection and 
diagnosis methods have to be selected. To reach a certain fault coverage in most 
cases different methods have to be combined properly. Therefore, the book offers a 
selection of FDD-methods which can be used to meet the practical needs of special 
processes. 

As FDD-methods can only be judged by applying them to concrete processes, a 
broad class of applications is treated in a second book: 

Fault diagnosis of technical processes 
- Applications - 

published also with Springer-Verlag to appear around 2006. The treated examples 
are as follows 

A COMPONENTS 

- Fault detection of electrical drives; 

- Fault detection of electrical actuators; 

- Fault detection of fluidic actuators. 
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B MACHINES AND PLANTS 

- Fault detection of pumps; 

- Leak detection of pipelines; 

- Fault detection of industrial robots; 

- Fault detection of machine tools; 

- Fault detection of heat exchangers; 

- Fault detection for medical engineering devices. 

C AUTOMOTIVE SYSTEMS 

- Fault detection of combustion engines; 

- Fault detection of automobiles. 

D FAULT-TOLERANT SYSTEMS 

- Fault-tolerant systems; 

- Fly-by-wire and drive-by-wire systems. 

Then applications show which methods are applicable and demonstrate with ex¬ 
periments of real processes how successful results can be obtained but also shows 
which limits do exist. 
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